[ Jocelyn Ireson-Paine's Home Page | Publications | Dobbs Code Talk Index | Dobbs Blog Version ]

What Might Category Theory do for Artificial Intelligence and Cognitive Science?

Introduction

In August 2008, I posted a series of answers to the question "Why should we be interested in category theory?" on the A Categorical Manifesto thread in the n-Category Café blog. Category theory is a mathematical tool often used to elucidate similarities between apparently unrelated pieces of mathematics. I suggested it could do the same for AI and cognitive science, and discussed examples that include neural nets, holographic reduced representations, Prolog-stye unification, analogical reasoning, and understanding metaphors. Here is the same posting, with an informal explanation of category theory added, and the rest made intelligible (I hope) to non-category-theorists.

What is category theory?
Category theory and the Interesting Truths
Unifying unification with holographic reduced representations
Unifying neural nets with logic
Unifying dynamical systems with logic
Understanding analogical reasoning — the core of cognition and a chemistry of concepts?
"A necessary tool in the construction of an adequately explicit science of knowing"
By the way — spreadsheets!

What is category theory?

A "category" is a mathematical structure, in the same sense that a vector space, a group, or a set is. Vector spaces contain vectors: groups contain elements that represent symmetry operations; sets contain points. And a category contains "objects" related by "arrows". Visualise it as dots connected by arrows.

These objects and arrows must follow rules. Vector spaces must also: they have to have a null vector and obey linearity constraints. Groups must contain an inverse for each element, and an identity that leaves elements unchanged when you multiply by it. And all categories must follow the rule that if there is an arrow from object A to object B, and an going from object B to object C, there must also be a composite of these from A to C. The categories must follow other rules too, but that is an important one. Visualise the fact that if you can put a pig through a mincer to make mince, and mince through a sausage machine to make sausages, you can put a pig through (a mincer connected to a sausage machine) to make sausages.

One can study the consequences of these rules for their own sake. But many users of this mathematical "app" treat categories as mathematical workspaces. The objects will stand for mathematical entities such as vector spaces, groups, or sets. Or for computational entities such as programs, neural networks, concept hierarchies, or ontologies. They can even stand for categories — which makes category theory horribly and fascinatingly recursive.

An arrow in such a "workspace" category will usually stand for some kind of transformation or function or process or relation. But almost always, if not always, an arrow doesn't represent any old transformation or function or process or relation. It represents a transformation or function or process or relation that "preserves structure".

Just how the arrows preserve structure depends on the category. Mathematicians will know what I mean if I say that in the category whose objects are groups, all arrows (which by definition, transform one group to another group) map the identity element in the source group to the identity element in the target group. In a category whose objects were jokes, one would probably want all arrows (which by definition, would transform one joke to another joke) to map the punchline in the source joke to the punchline in the target joke. In a category whose objects were trains, one might make all arrows map the engine of the source train to the engine of the target train.

The objects and arrows in a category draw a picture of the structure-preserving transformations that exist within a coherent collection of mathematical entities. Things get more interesting when we have two categories. For example, we might have a category of concept hierarchies and a category of neural nets, and map the hierarchies onto the nets such that the transformations between nets mirror the transformations between hierarchies. Visualise a Perspex sheet bearing the network of transformations between hierarchies, and another sheet bearing the network of transformations between nets, and us holding one above the other and drawing a line from each hierarchy to its counterpart net.

The mathematical tool that handles this is a "functor". Visualise it as an arrow that bundles together all the lines you have drawn between the sheets.

A functor can be regarded as a structure-preserving transformation between categories: that is, as an arrow in the category of categories, which maps the category of hierarchies onto the category of nets. Visualise the category of categories as dots and arrows. Each dot is a Perspex sheet like the ones mentioned above. You have drawn lines between the sheets, as above. Each arrow is a bundle of all the lines belonging to one transformation between a pair of sheets.

This works because it maps the parts of the category of hierarchies (whose parts are the hierarchies) onto the parts of the category of nets (whose parts are the nets) in the same consistent way that I described for mapping group identities onto group identities, engines onto engines, or punchlines onto punchlines.

I said that category theory is insanely recursive. I have introduced transformations between categories: namely functors. But one can also have transformations between functors: that is, transformations between transformations between categories. These are called "natural transformations".

Consider our hierarchies and nets again, but now assume we have type-A nets and type-B nets. The category of hierarchies is as before; the category of nets contains the type-A nets and the type-B nets. Visualise the hierarchies as green dots in the hierarchies category, and the type-A's as red dots and the type-B's as blue dots in the nets category. Then there is a bundle of arrows mapping the green dots to the red dots. That's one functor, which I'll call A. There is another bundle of arrows mapping the green dots to the blue dots. That's another functor, which I'll call B. If there is a natural transformation between A and B, it means that they are compatible in a special and particularly useful way. I'll return to this in my section on neural nets.

Category theory and the Interesting Truths

A very general reason that categories will be useful is that they'll help us understand the "Interesting Truths" of computer science: the "subtle unifying insights between broad classes of mathematical structures", to quote the paragraph from SF author Greg Egan's novel Incandescence that I once posted to the n-Category Café:

‘Interesting Truths' referred to a kind of theorem which captured subtle unifying insights between broad classes of mathematical structures. In between strict isomorphism — where the same structure recurred exactly in different guises — and the loosest of poetic analogies, Interesting Truths gathered together a panoply of apparently disparate systems by showing them all to be reflections of each other, albeit in a suitably warped mirror. Knowing, for example, that multiplying two positive integers was really the same as adding their logarithms revealed an exact correspondence between two algebraic systems that was useful, but not very deep. Seeing how a more sophisticated version of the same could be established for a vast array of more complex systems — from rotations in space to the symmetries of subatomic particles — unified great tracts of physics and mathematics, without collapsing them all into mere copies of a single example.

I hope you can see from my description of functors as drawing a picture of one category in another, and of natural transformations as transforming one such drawing into another, why Egan wrote that passage. It applies to all of mathematics. But more specifically, I'm interested in categories because I think they have huge — and under-used — potential in understanding human cognition and in building "intelligent" computers.

Unifying unification with holographic reduced representations

My first example concerns understanding how "logical unification" and "holographic reduced representations" are related. Logical unification is a kind of pattern matching. (It's often just called "unification", but I'll preface with "logical" to avoid confusion with "unify" as in "insight". Logicians reading this, please forgive such a cursory description.)

For instance, in Prolog, unifying the term 2*x+3*y with the term U+V, will bind the variable U to 2*x and the variable V to 3*y. Prolog programmers love this, because it's such an easy way to get at pieces of structures.

But — computer science also knows a different kind of pattern matching, implemented by so-called "holographic reduced representations" or HRRs. These encode structures as bitwise or numeric vectors in very high-dimensional spaces, using circular convolution and circular correlation (or analogous operations) to build and decompose vectors. I'll show an example in the next paragraph; for those who want to know more, these references give some background:

Pentti Kanerva, Dual Role of Analogy in the Design of a Cognitive Computer. In Advances in Analogy Research: Integration of Theory and Data from the Cognitive, Computational, and Neural Sciences. Workshop. Sofia, Bulgaria, July 17–20, 1998.
Jane Neumann, Learning the Systematic Transformation of Holographic Reduced Representations. Cognitive Systems Research, 3, (2002), 227-235. (The link used to work, but is now giving a not-allowed-to-view error.)
Chris Eliasmith, Structure Without Symbols: Providing A Distributed Account of Low-level and High-level Cognition. Southern Society for Philosophy and Psychology Conference March, 1997.
Jocelyn Ireson-Paine, Binary Holographic Reduced Representations in SWI-Prolog. This is an expanded version of the example below, implemented in Prolog which you can download.

Here's an example from Kanerva's paper. The following equation computes a bitwise vector encoding data about France:

France = [Capital*Paris + Location*WesternEurope + Currency*Euro]

Here, + is vector addition, * is exclusive-or. The brackets [] denote a "normalised mean", whose exact definition is given in Kanerva's paper. Capital, Location and Currency are three randomly-generated vectors. The vector Capital represents a slot or field whose value will represent the capital of France; similarly for the vectors Location and Currency.

Paris, WesternEurope and Euro are three more randomly-generated vectors, representing the values of these slots. It's important to note that all six vectors should be interpreted as atomic symbols: the actual values of their components have no significance.

Having computed the vector France, we can "probe" it by exclusive-oring with the vector Capital:

WhatCity = Capital *France
= Capital * [Capital*Paris + Location*WesternEurope + Currency*Euro]
= Capital*Capital*Paris + Capital*Location*WesternEurope + Capital*Currency*Euro
= Paris + (a bit of other stuff).

This works because * (that is, exclusive-or) is its own inverse, and because it distributes over the normalised mean [].

The final stage is to do what Kanerva calls a "clean-up". You assume there is an associative memory that contains the six vectors originally generated. Fed any vector, it will quickly return whichever of the six closest to it. Which in this case,we hope will be Paris.

For this to work, various conditions on the distribution of vectors must be satisfied. But if they are, you can do some really nifty analogical reasoning. For example, you can ask how France is related to Paris by exclusive-oring the vector France with the vector Paris. This works the same way as above, and after the clean-up, will return Capital. Kanerva goes on to describe more ambitious reasoning tasks, such as forming a vector that represents Sweden, then asking "What is the Paris of Sweden"?

On the surface, holographic reduced representations are utterly different from logical unification. But I can't help feeling that, at a deeper level, they are closely related. And there is a categorical formulation of logical unification, described in the first reference below, by Rydeheard and Burstall. They say their formulation is derived from an observation by Goguen. So it may be based (I'm not an expert) on the ideas in the second reference:

David Rydeheard and Rod Burstall, Computational Category Theory. Prentice Hall, 1988. See Chapter 8.
Joseph Goguen, What is unification? A categorical view of substitution, equation and solution. In Resolution of Equations in Algebraic Structures, 1: Algebraic Techniques, Academic Press, 1989.

So, can we extend that categorical formulation to holographic reduced representations? I don't know. But if we could, we would better understand how they are related to logic programming, and we might gain new tools for analogical reasoning. It's worth trying.

(This links to the first reply my original posting got to the above.)

Unifying neural nets with logic

Here's my second example: neural networks, and logic-based representations. I suppose most readers have come across neural networks, but for those who haven't, I'll describe them briefly. Neural networks consist of "nodes" connected in a network. The connections between nodes correspond to synapses between biological neurons. Typically, a node sums the signals from its input nodes, multiplying each input signal by the strength of the corresponding connection. It may then process the sum in some way — e.g. by thresholding — to calculate an output and an "activation". The entire network can be regarded as a vector of these activation values, representing the data in the network.

Most kinds of neural network can learn by adjusting the strengths of their connections, thus mimicking — in a very simplified way — how the brain learns by adjusting synaptic strengths.

In Artificial Intelligence, there is a vast cultural chasm between neural nets and symbolic ways to represent knowledge. It's rare for the same researcher to work on both, and students usually get taught about them in different courses. This reflects the differences between the two. The symbols in symbolic representations denote things. But in some kinds of neural net, the individual nodes don't, although a whole set of nodes might. Therefore, such nets are often called "subsymbolic", a word introduced by Paul Smolensky in the following paper:

Paul Smolensky, On the Proper Treatment of Connectionism. In Readings in Philosophy and Cognitive Science, MIT Press, 1993.

(Holographic reduced representation vectors are also subsymbolic.)

As well as the "symbolic/subsymbolic divide", there is a "compositional/non-compositional divide". Symbolic ways to represent knowledge are "compositional", in that the meaning of a sequence of symbols is determined by the meanings of its parts. Compositionality is good: if you can represent "coffee cup" and "cake", it's nice to be able to represent "coffee cup and cake" too. But if you have (say) an image-processing net which has learnt to represent images of coffee cups by one pattern of activations, and images of cakes by another, you can't assume that superimposing or adding the patterns will represent a coffee cup and a piece of cake. So there has been a lot of argument about questions like: is the brain compositional; is composition by superimposing activation patterns the only kind of compositionality; and (as engineers) how can we design nets that are compositional?

Yet another difference is that symbolic ways to represent knowledge are often said to be "contextual": they are leaky, in that what happens at one node can be very sensitive to data elsewhere in the network. Which is different from my Prolog program that will happily unify 2*x+3*y with U+V, giving the same results no matter what else is in my laptop's memory. Again, there's been a lot of argument about this. Unfortunately, a lot of the key papers on this and the other topics, though on-line, are locked up in pay-to-view sites. The reference below seems a good, and free, starting point:

James Garson, Connectionism, entry in the Stanford Encyclopedia of Philosophy. First published Sun May 18, 1997; substantive revision Wed Mar 7, 2007.

I mention these differences not only to show how far apart neural nets are from symbolic systems, but also because category theory must have a lot to offer in answering such questions. Which is why the two references below are exciting:

Ronald Brown and Timothy Porter , Category Theory and Higher Dimensional Algebra: potential descriptive tools in neuroscience.
Michael Healy, Category Theory Applied to Neural Modeling and Graphical Representations. In Proceedings of the International Joint Conference on Neural Networks, (IJCNN 2000), IEEE.

In the first, Brown and Porter suggest using colimits to understand brain activity in terms of the component structures and neurons. A "colimit" is an operation that takes a collection of objects and arrows, and in some sense generates the smallest object that contains all the information they hold but no more. Colimits can be used to model "putting together" or "glueing" of all kinds of things — sets, groups, categories, program specifications, spreadsheets, programs, ontologies — and this is what Brown and Porter want them for.

Healy goes further. He sets up a category Conc whose objects are theories representing concepts. Arrows represent "is a subconcept of": there is an arrow from c1 to c2 if c1 is a subconcept of c2. He sets up another category Neur, whose objects are nodes in a particular neural network. Arrows represent the "can activate" relationship: there is an arrow from n1 to n2 if n1 can activate n2. Such an arrow tells us something about the neural network, namely that there is a path between n1 and n2, and that all the synaptic connections along this path are strong enough to propagate a signal from n1 to n2.

This wouldn't be interesting unless the categories Conc and Neur were related, and indeed, Healy lets us define functors from Conc to Neur. In effect, each functor maps a network of concepts, represented as theories, to regions in a neural network where these concepts are represented. You can now see where I took some of my introductory examples from.

What's nice is that we can combine these neural-network representations by using natural transformations. I hope I have this right, because Healy doesn't give much detail, but the point seems to be that a single neural network might learn different instances of the same concept.

For example, imagine an ambidextrous robot with mirror-identical arms and hands. Each hand has touch sensors: the left hand's sensors are wired to one set of input nodes in the network, and the right hand's to another set of input nodes. Assume the robot is taught to grasp an orange with its left hand, and then, independently, with its right hand. The region of its neural network fed from the left hand sensors learns a set of associations between finger pressure and movement; the region fed from the right hand learns, independently, a very similar set. This is an example I made up, by the way, so blame me if it turns out to be misleading.

The left hand and right hand associations can then be regarded as two functors, L and R, from Conc to Neur. I'm not sure what the logical content of the objects in Conc would be — theories describing what an orange is for, or what its shape is, or how to grasp it — but at any rate, they'd say something about oranges, and they'd be represented in two different regions in the neural network. We can merge these, Healy says, by setting aside a third region in the network which combines corresponding outputs from each of the first two.

But is this in fact a valid representation of the learnt concepts? Yes, if we let natural transformations guide us. As well as functors L and R, let there be a functor M which maps Conc to this third region. Set up mappings from L to M and R to M. Then, Healy shows, the third region is a valid representation if these mappings are natural transformations.

What does this mean? Going back to my explanation of natural transformations in terms of green and red and blue dots, the concept hierarchy is a network of concepts painted in green. Functor L maps it to a picture of this network implemented in neural nets and painted in red. Functor R maps it to a different picture of this network implemented in neural nets and painted in blue. And if L and R form a natural transformation, it means they are compatible in a special way, such that we can paint yet another picture in purple, that combines the information in the red picture with the information in the blue picture.

Can such ideas be applied to other kinds of neural network? Can we show other kinds of neural network to be reflections, in suitably warped Eganian mirrors, of logical theories? Can we characterise neural networks by properties of the categories containing them? (I suspect many of these categories won't have products, because of interference between subnetworks, but that's only a passing thought.)

And — very importantly, because we engineers have such small minds, so need to break problems into tiny pieces before we can understand them — can we use colimits and other tools to help us build modular neural nets?

(This links to the first reply my original posting got to the above.)

Unifying dynamical systems with logic

My third example is similar in spirit to the last. In the early days of programming, one of Artificial Intelligence's motivating ideas was the Physical Symbol System Hypothesis put forward by Allen Newell and Herbert A. Simon in the following paper:

Allen Newell and Herbert A. Simon, Computer Science as Empirical Inquiry: Symbols and Search. In Communications of the ACM, 19, (March 1976).

This states that: "A physical symbol system has the necessary and sufficient means for intelligent action". I suppose every hacker believes this, and hopes in their deepest soul that, one day, they will write the program that shows this to be true.

More recently, various researchers have proposed using dynamical systems ideas to understand cognition. This is the Dynamical Systems Hypothesis, and there's some background on it in these references:

Nils J. Nilsson, The Physical Symbol System Hypothesis: Status and Prospects.
Robert F. Port, Dynamical Systems Hypothesis in Cognitive Science. Draft entry (December 5, 2000) for Encyclopedia of Cognitive Science, MacMillan Reference.

I was reminded of this by a nice paper I found, written by David Tall about the learning of mathematical concepts. He wrote it before the Dynamical Systems Hypothesis became popular, but it uses dynamical systems ideas to explain how learning one concept might be blocked by a conflicting earlier version of the same concept:

David Tall, Cognitive Conflict and the Learning of Mathematics. Presented at First Conference of The International Group for the Psychology of Mathematics Education, Utrecht, Netherlands, Summer 1977.

So, in a spirit similar to that of my neural network section, I ask: can we use category theory to help us map between logical descriptions of knowledge and their implementation as dynamical systems. Can we use it to help us build modular dynamical systems implementations?

Understanding analogical reasoning — the core of cognition and a chemistry of concepts?

My fourth example ratchets up the stakes, and turns — I'll explain why in a minute — to Margaret Thatcher. In the 1980s, she was Prime Minister of Britain, and Denis Thatcher was her husband. Ronald Reagan was President of America. So, if you were to ask an American "Who is First Lady of Britain?", what could they reply? A reasonable answer is "Denis Thatcher". But to make it, you must be willing to "slip" the concept of First Lady, weakening it to First Spouse as you transfer from an American frame of reference to a British.

Now consider the following two number patterns:

A: 1 2 3 4 5 5 4 3 2 1

B: 1 2 3 4 4 3 2 1

What is to B as 4 is to A? Most people would answer 3. It's to the left of the central pair of numbers in B, just as 4 is to the left of the central pair of numbers in A. 4 in A and 3 in B are like Nancy Reagan and Denis Thatcher: they're not the "most distinguished point" in their landscape, but they are closely related to that point. They fill the same rôle.

Here is another number problem:

A: 1 2 3 4 5 5 4 3 2 1

D: 1 1 2 2 3 3 4 4 5 4 4 3 3 2 2 1 1

What is to D as 4 is to A? A brute-force attempt to find D's central pair won't work. But it seems reasonable to "slip" the concepts of "pair" and "singleton" as we transfer from A's frame of reference to D's. Which gives us the answer 4 4.

These problems come from a column that Douglas Hofstadter wrote for Scientific American. He collected them, with later additions, into a book, and I'm particularly interested here in two of its chapters:

Douglas Hofstadter, Metamagical Themas, Basic Books, 1985. See Variations on a Theme as the Crux of Creativity (Chapter 12) and Analogies and Roles in Human and Machine Thinking (Chapter 24).

Hofstadter argues that analogy is at the core of human thought and creativity. To quote from Chapter 24:

Don't press an analogy too far, because it will always break down. In that case, what good are analogies? Why bother with them? What is the purpose of trying to establish a mapping between two things that do not map onto each other in reality? The answer is surely very complex, but the heart of it must be that it is good for our survival (or our genes' survival) , because we do it all the time. Analogy and reminding, whether they are accurate or not, guide all our thought patterns. Being attuned to vague resemblances is the hallmark of intelligence, for better or for worse.

Therefore, he says, human analogical thought is what we need to study. And we should do so by within "microdomains" such as the number problems, because then — in proper scientific spirit — we can isolate analogical thought from other phenomena.

He was particularly interested in why we slip some concepts more than others, and in what one might call rôles versus literals. As an example of the first: I'm in the pub, and I accidentally spill my pint of beer over my shoes. Why is it natural for me to "slip" my situation to "Good job I don't have to go anywhere smart tonight!", but not to "Shame gravity isn't weaker, so the beer would have stayed in the air!"?

As an example of the second, why, when solving the two number problems, do we perceive 4 as filling a rôle at all in pattern A, then try to find a value for the equivalent rôle in patterns B and D? Why don't we just say, "Well, the obvious answer is the same value, 4"?

There are some lovely examples in those two chapters, which I do recommend to anyone interested. But what's the connection with categories?

Hofstadter and his colleagues wrote several programs to implement their theories of analogical thought in microdomains. These programs include include Copycat:

Page on "Letter Analogies", Center for Research on Concepts and Cognition (also known as the Fluid Analogies Research Group), Indiana University, 2002.
Douglas Hofstadter and Melanie Mitchell, The Copycat Project: A Model of Mental Fluidity and Analogy-Making.

And they include Tabletop:

Robert French, The Subtlety of Sameness: a Theory and Computer Model of Analogy-Making, MIT Press, 1995.

Doesn't that title sound like something to interest every red-blooded n-categorist? Anyway, Tabletop solves problems such as "I am sitting at a table and I touch my nose. What should you touch?". Or "I touch the flower vase in the centre. What should you touch?". Or, one more: "I touch the cup near me. You have a glass but no cup. What should you touch?"

Now, these programs are complicated. They contain swarms of biologically-inspired enzyme-like entities swimming around in a kind of symbol-filled cytoplasmic glop, glomming onto bits of concepts, and slowly crystallising out into a finished answer to the problem. They lack — as far as I know — any easily understandable semantics.

So, I'm going to suggest, can we use categories to give them such a semantics? Or, more generally, to formalise our ideas about the best solutions to the kind of analogy problems that Hofstadter and his colleagues discuss? It seems to me that a good starting point would be some papers by Joseph Goguen on something he called "conceptual blending". In these, he uses colimits and colimit-like operations. As I mentioned when discussing neural nets, these are often used to model "putting together". And in "conceptual blending", Goguen models the putting together of concepts to make metaphors. For example putting together the meaning of "house" and the meaning of "boat" to make the meaning of "houseboat":

Joseph Goguen and D. Fox Harrell, Style as a Choice of Blending Principles.
Joseph Goguen, Formal Notation for Conceptual Blending. 2003.

I shall finish this section with two more Hofstadter quotes. As I said at the beginning, the stakes are high:

[…] I have been emphasising the idea of the internal structure of one concept. In my view, the way that concepts can bond together and form conceptual molecules on all levels of complexity is a consequence of their internal structure. What results from a bond may surprise us, but it will nonetheless always have been completely determined by the concepts involved in the fusion, if only we could understand how they are structured. Thus the crux of the matter is the internal structure of a single concept, and how it "reaches out" towards things it is not. The crux is not some magical, mysterious process that occurs when two indivisble concepts collide; it is a consequence of the divisibility of concepts into subconceptual elements. As must be clear from this, I am not one to believe that wholes elude descriptions in terms of their parts. I believe that if we come to understand the "physics of concepts", then perhaps we can derive from it a "chemistry of creativity", just as we can derive the principles of the chemistry of atoms and molecules from those of the physics of quanta and particles. But as I said earlier, it is not just around the corner. Mental bonds will probably turn out to be no less subtle than chemical bonds.

You can think of concepts as start, and knob-twiddling as carrying you from one point on an orbit to another point. If you twiddle enough, you may well find yourself deep within the attractive zone of an unexpected but interesting concept and be captured by it. You may thus migrate from concept to concept. In short, knob-twiddling is a device that carries you from one concept to another, taking advantage of their overlapping orbits.

Of course, all this cannot happen with a trivial model of concepts. We see it happening all the time in minds, but to make it happen in computers or to locate it physically in brains will require a fleshing-out of what concepts really are. It's fine to talk of "orbits around concepts" as a metaphor, but developing it into a full scientific notion that either can be realised in a computer model or can be located inside a brain is a huge task. This is the task that faces cognitive scientists if they wish to make "concept" a legitimate scientific term. This goal, suggested at the start of this article, could be taken to be the central goal of cognitive science, although such things are often forgotten in the inane hoopla that is surrounding artificial intelligence more and more these days

"These days" were in the 1980s, so perhaps we can disregard the remark about the inane hoopla. Though there seems to be a lot of inane hoopla surrounding computing in general. But what about the rest of the quotes. Does it make sense to think about a "chemistry of concepts"; to make "concept" a legitimate scientific notion?

Actually, Michael Healy, who I mentioned earlier for his work on mapping concepts to neural nets, has with coauthors Thomas Caudell and Timothy Goldsmith, released another report:

Michael Healy, Thomas Caudell, and Timothy Goldsmith, A Model of Human Categorization and Similarity Based Upon Category Theory. University of New Mexico Technical Report EECE-TR-08-0010, 1st July 2008.

This is their abstract:

Categorization and the judgement of similarity are fundamental in cognition. We propose that these and other activities are based upon an underlying structure of knowledge, or concept representation, in the brain. Further, we propose that this structure can be represented mathematically in a declarative form via category theory, the mathematical theory of structure. We test the resulting mathematical model in an experiment in which human subjects provide judgements of similarity for pairs of line drawings using a numerical scale to represent degrees of similarity. The resulting numerical similarities are compared with those derived from the category-theoretic model by comparing diagrams. The diagrams represent distributed concept structures underlying the line drawings. To compare with a more conventional analysis technique, we also compare the human judgements with those provided by a two-dimensional feature space model equipped with a distance metric for the line drawings. The results are equally favorable for both models. Because of this and the putative explanatory power of the category-theoretic model, we propose that this model is worthy of further exploration as a mathematical model for cognitive science.

Which is extremely interesting, and reminiscent of Hofstadter's remark: "Thus the crux of the matter is the internal structure of a single concept […]".

"A necessary tool in the construction of an adequately explicit science of knowing"

I'll finish by talking about a paper by the famous category theorist William Lawvere. It's the following reference — which, I'm pleased to note, is public at Google Books — and I mention it because I found it in a book on cognition:

William Lawvere, Tools for the Advancement of Objective Logic: Closed Categories and Toposes. In John Macnamara and Gonzalo E. Reyes (eds.), The Logical Foundations of Cognition, Oxford University Press, 1994.

Here's how Lawvere concludes the paper:

Despite some simplification in the above, needed for rapid description, I hope that I have made clear that there is a great deal of useful precision lying behind my illustrations, and a great deal to be developed on the same basis. Thus I believe to have demonstrated the plausibility of my thesis that category theory will be a necessary tool in the construction of an adequately explicit science of knowing.

I don't know enough to describe these "illustrations" succinctly, but they include a notion that he often refers to, the "unity-and-identity-of-opposites", which I discovered from this reference:

Steven Awodey, Lars Birkedal, Dana S. Scott, Local Realizability Toposes and a Modal Logic for Computability (Extended Abstract). Marked as "To be published in Electronic Notes in Theoretical Computer Science".

is also known as an adjoint cylinder. I mention this only because it sounds like something wonderful.

Lawvere's illustrations also include the notion of an "adequacy comonad". I hadn't come across these before, but Lawvere's explanation of them sounds as though they might be very useful in understanding approximations to concepts. Moreover, the adequacy comonad sounds like an exquisite and intricate piece of mathematical machinery, and I hope someone will give me a working model. One I can pick up, prod, peer into, pull to pieces, and generally play with. Steam or electric.

However, I've another reason for mentioning the notion. Lawvere uses it to show how the study of a kind of category called a "closed category" contributes, via adequacy comonads, to the study of "metric spaces". These are mathematical structures that consist of points and a way of defining the distance between them: mathematicians use them in investigating the concept of distance, abstracted away from other concepts.

He then says, if I understand him correctly, that, conversely, we can take notions such as "radius" and "engulfing" from metric spaces and create precise analogues of these notions within certain kinds of category. His wording is technical, so I've omitted it from this Dobbs blog version of my answers, but you can find it in my original n-Category Café posting.

Perhaps Lawvere is using the words "radius" and "engulfing" in some sense far removed from their everyday meaning. But if not, can his remark also help us import other spatial and dynamic notions into category theory? Such as "orbit" (around a concept); "attractive zone" (of a concept); "migrate" (from concept to concept).

In other words, can we use it to import into category theory, and thus into categories of neural nets and other cognition-related entities, the metaphors that Hofstadter says must be made precise before we can create a "chemistry of concepts"? If so, then category theory will be an invaluable tool for cognitive science.

Even if I'm completely wrong in that last paragraph, Lawvere says category theory "will be a necessary tool in the construction of an adequately explicit science of knowing", which is good enough for me. So I did an experiment. I searched Google and CiteseerX on his main title, Tools for the Advancement of Objective Logic.

Apart from citations in library catalogues, and papers on philosophy or pure category theory, I found only two authors whose papers seemed, on a quick skim, to depend in some important way on Lawvere's paper. For interest, I give the references below. (The second author, Zippora Arzi Gonczarowski, had one paper on a pay-to-view site that I couldn't read, but the abstract shows me that it's about the same topic as the paper below):

François Magnan, Distributed Components Aggregation for eLearning: Conducting Theory and Practice. (This link did work, but the site now belongs to a domain squatter.)
Zippora Arzi Gonczarowski, Wisely Non-Rational. 1998.

Lawvere's paper seems filled with crystals of compressed insight, glittering through the surface of the text like quartz through a covering of dry pine-needles. I find them intensely satisfying to contemplate, in the same way I do some El Lissitzky posters and De Stijl typography. Perhaps, high in my brain's abstraction hierarchy, the same neurons are activated by both: category theory as the ultimate abstract sculpture.

But, aesthetics aside, how can we bring such power to more researchers than the two who cited it? I'm using Lawvere's paper as a representative of category theory generally, of course. Ask a first-year LISP hacker to iterate over a list, and they'll reach naturally for recursion or a MAP function. Can we bring up a generation of category-theory hackers, who, when you ask them to model putting something together, will reach naturally for a colimit?

(This links to the first reply my original posting got to the above.)

By the way — spreadsheets!

I thought I'd add one final reason for my interest in category theory. Spreadsheets!

Excel spreadsheets are a big source of errors. One reason is that Excel doesn't have the notion of "modules": chunks of program that can be written, tested and debugged one at a time, then used in as many different spreadsheets as you want. If you develop a spreadsheet in Excel, and want to use part of it in another spreadsheet, you have to copy and paste cells from the first spreadsheet to the second. If you then need to change these cells — perhaps because they have bugs, or perhaps to make the formulae in them faster — you have to make the same changes to both spreadsheets. You can't just change one component, and have the changes propagated to every use of it. So Excel programmers tend to accumulate numerous almost-identical copies of their code. Version-control becomes a nightmare.

I've been experimenting with ways to overcome this, via various schemes for modularising spreadsheets. And below are references to a demo of one such scheme — a prototype component library for Google's online spreadsheet, which lets you add code to a spreadsheet much as you would bar-charts and other displays — and to a paper about the component library for Google's spreadsheet and for Excel. Both are for spreadsheet users, so not very technical:

Jocelyn Ireson-Paine, Spreadsheet Components, Google Spreadsheets, and Code Reuse. Dr Dobbs blog entry, May 2008. This is the demo, with screen shots.
Jocelyn Ireson-Paine, Spreadsheet Components For All. In Proceedings of European Spreadsheet Risks Interest Group 2008, Greenwich, July 10-11, 2008.

Both these were inspired by Joseph Goguen's work on sheaf semantics, and his dogma that the behaviour of a system is the limit of the behaviours of its components, explained in:

Joseph Goguen, Sheaf Semantics for Concurrent Interacting Objects. In Mathematical Structures in Computer Science, 2 2(1992).

I said a bit about this in a posting Re: An Invitation to Higher Dimensional Mathematics and Physics to the n-Category Café. It seems that Goguen's work also inspired research into the algebraic specification languages for software.

It illustrates the power of category theory to project down into unsuspected regions of computing. Even if we shall never see a book on Category Theory with Excel.

(This links to the postings after my original.)

What Might Category Theory do for Artificial Intelligence and Cognitive Science?

Introduction

Contents