[ Jocelyn Ireson-Paine's Home Page ]

Report on visit to University of Minho, May 2000

For the benefit of people reading the printed copy, this report is available as visit_report.html.

This is my fourth visit to the University of Minho, and my third to the Department of Informatics. My first two visits to Informatics were to the Project DAVID workshop in September 1996, and to the department in April and May 1998, on a travel grant from the Fundação para a Sciência e a Tecnologia. I had also visited the School of Economics and Management in September and October 1998, to discuss applications of informatics to economics, and to redesign their Web site. This entailed implementing a database of staff and students which generated output automatically in response to searches.

Foreigners who come here are lucky in having a rich and varied culture to explore; this time as ever, I enjoyed the chance to visit Braga, and to experience more of the food, drink, music and literature of Portugal and the Minho. Special recommendations to Pastelaria Sydney, Restaurant Aires, and Grafonola; also to the Adega Transmontana, Daniel's Bar, and Casa de Pasto Pregão. The Monumentais Festas do Enterro da Gata were on, as I had already seen in 1998. This year, I was able to see the Cortejo Academico again. Less praiseworthy features of the Enterro were the mosquitos of Santoinho, and the way your students bully the bulls. Braga continues to grow, and it really is time Mosquito Mechado stopped (a) knocking down old houses; (b) building apartments; and (c) knocking down the old houses by the Este and then building apartments on the ruins. Show some sensitivity to the space of the people living there.

But the main purpose of my visit was not to criticise Braga's town-planning policies but to discuss machine learning. I found time to do a few other things also, and these are described below.

Machine learning and economic data

My visit was funded by the British Council, so that I could meet Paulo Azevedo to discuss collaboration on applying machine learning to large amounts of economic data, which I have access to through my work at the Institute for Fiscal Studies. While here, I organised access to one of the datasets, the Family Expenditure Survey, which contains detailed microeconomic data on UK household spending since the 1960s. The IFS has used this in many research projects, such as an analysis of the effects of recent Budgets, and we hope Paulo's machine learning algorithms will detect some interesting correlations in it. We agreed on a suitable format for the data, and I obtained a sample for Paulo to try his system on. He will make a return visit (funded by the Fundação para a Sciência e a Tecnologia) to the IFS in June, to give a presentation on machine learning and to meet Graham Stark, who implemented the microeconomic models which use this data, and with whom I work when there (and who is to thank for getting me the data).

Generalisation as adjunction

I believe that while here, I may have found a categorical formulation of generalisation, as an adjunction. (Adjunction is a standard categorical construction: for a nice on-line introduction to it, see John Baez's articles on This Week's Finds in Mathematical Physics Week 75 to Week 79.) If this works, it would be very interesting, because it would apply to all forms of generalisation, including meural nets, logical induction, and even statistical curve-fitting.

Intuitively, the idea is that any generalisation should contain just enough information to reconstruct the examples from which it was learnt, plus the extra points that one wishes to interpolate, extrapolate, or otherwise predict. If the generalisations are expressed in the same language as the examples, we can express this idea by creating a category whose elements are sets of examples or generalisations, and then using the 'limit' construction to formalise the notion that, in some sense, the generalisation is a limiting case of the information contained in the examples.

However, because the language in which generalisations are expressed may not be the same as that of the examples, we need two categories, not one. Also because of this, and because of problems such as noise, we shan't always be able to find a perfect fit of generalisation to examples. This makes adjunction the appropriate construction. I am still trying to decide what kind of extra structure (morphisms) should be imposed on the categories.

I've now drafted a report on this.

If the formalisation works, it may be useful in developing algorithms such as those described in Chris Thornton's book Truth from Trash, by showing how the 'optimum' generalisation arises as a recursively-computed limiting case of less complete generalisations.

Of course, it's also possible that this formalisation is just form and no content, or even wrong. José Valença has suggested that we are sometimes too keen to make things into categories, and that it would be worth looking at topological spaces, Stone's theorem, and the connection between frames and spaces instead. This reminds me of some work that Hilary Priestley wrote up on Galois connections and lattices to formalise the relation between concepts and properties.

Algebraic Web specification and CafeOBJ

I also continued the work for which I visited in April-May 1998, on algebraic specification of Web sites. In that visit, I had installed the CafeOBJ algebraic specification language and experimented with it for specifying how interactive Web pages work. Unfortunately, the implementation turned out to be unbearably slow and not to work on several machines.

This visit, with help from Hendrik Hilberdink, I obtained a Linux implementation and installed it on LMF. This is freely available: just type the command cafeobj. If you want to install your own copy, I have downloaded the files onto Shiva: follow the installation instructions.

An example of its use for specifying - and generating - Web sites is here. But there's more to say than just including this code, and I shall say some of it tomorrow in a talk to the Department. The text of the talk, in a rough form, is here.

Web-based interactive economic models and Portuguese debt problems

While talking to Paulo Azevedo, the topic of debt amongst the Portuguese middle classes arose. This is apparently a big problem, and one that worries the Government: there have been several recent TV programs about it. Part of the problem is that people can arrange to borrow some or all of their salary a month in advance from banks, and then don't realise how much the interest is, and may act as though they actually have the money rather than just having borrowed it. Perhaps more serious is the tendency to take out large amounts of credit to buy houses, cars, and even weddings and holidays. Following on from this, we had the idea of building a Web-based interactive financial planner. This would allow users to enter details of their income and expected expenses (including borrowings), and predict the net amount left month by month after credit and interest repayments.

The background to this is that most of my recent work at the IFS involved building interactive Web-based systems for teaching the public about economics and finance. In the first, Be Your Own Chancellor, users could play at being Chancellor of the Exchequer, changing tax rates and other policy variables and seeing the effects of their actions on a range of sample UK families. We then developed this program into Virtual Economy, a complete Web-based system for teaching economics in schools, and also implemented it at the BBC. The second program, Budget 9x, allowed users to type in details of their income and expenditure, and see how the Budget would affect these. We also implemented at the BBC. So with this experience, building a new interactive system would be relatively easy.

I obtained space on Logica, and built a prototype using Java Server Pages. (Thanks are due to Alcino Cunha, Bacelar Almeida, José Bernardo Barros, and José Faria for much help with machines and installing software.) You can see it here. Following a suggestion from Xana Barros, I called it Patinhas, after Walt Disney's miserly duck, a character apparently known to every Portuguese. (I believe the duck's name in English is Ebenezer, after Charles Dicken's Ebenezer Scrooge.) Although the forms are partly interactive, there is no financial model behind the system yet. I had hoped to get help from one of the economists, Miguel Angelo, but he was too busy.

From our experience in the UK, I believe Patinhas is worth finishing, and would gain good publicity for the University. Ideas for publicity include: submit a press release to the Diario do Minho and the Correio do Minho: they seem to publish anything the University sends them; organise an open day at one of Braga's cybercafés: they'd probably welcome the publicity and so provide facilities free; organise a University open day on interactive learning. First though, the program needs to be finished. The programming itself is not difficult: most of the work would be finding out about Portuguese credit arrangements. Pedro Henriques suggested organising this as a project to be done by one of the Economics and Management or Public Administration students that the Department teaches. These tend to be mature students with good knowledge of economics. Jose Lima is teaching this course, and I discussed the project with him.

Kawa, Camila, and functional languages in Java

There is an implementation of the Lisp-like functional language Scheme called Kawa. This is written in Java, compiles into Java virtual machine code, and can be mixed freely with Java: Java classes can be loaded into Kawa programs and vice-versa. In addition, Kawa is interactive, making it a useful front-end for interactive testing of Java programs. There are papers on its implementation here.

I suggest that it would be very useful to implement Camila in the same way, because this would make it extremely portable, and give access to Java's vast libraries - particularly valuable for network programming and graphical user interfaces. As a demonstration of Kawa, I've installed version 1.6.1 under my username popx on Shiva, and put the tar file from which I installed it here. If you don't want to install from that, you can still try Kawa out by adding /home/popx/kawa-1.6.1/ to your Java CLASSPATH and giving the command java kawa.repl. There is documentation under the directory above, which I've copied to my public_html/kawa/ directory.

Chu spaces and spreadsheets

Some time ago, I developed a language for safe spreadsheet programming, MM, based on category theory and Goguen's sheaf semantics of objects. During my 1998 visit, Luis Barbosa suggested that MM could be used for prototyping specification systems based on his Chu-space semantics for processes. We discussed this briefly, but there hasn't been time to take it further. I hope that we shall eventually be able to.

Computer-generated animation and preservation of Minho dances

During conversation with Zé João, the topic of preserving the local folkdances arose. I believe computer animation is now advanced enough that quite good animations of people moving can be generated from simple descriptions of joint positions and attitudes. On the other hand, there are also standard notations used to describe the dance steps of various countries. These are much more schematic, with primitives such as: cross left foot behind right; step back onto right foot; rotate body through half turn. But I suggest that it would be worth considering how one could translate these into the computer animation descriptions, and whether by doing so, one could generate animations of dances directly from the dance-description languages. This would have advantages over displaying the dances by animating photos: for example, one could view a dancer's movements from any position, very useful for beginners. This would also be a good way to preserve information about the local dances before they disappear, so perhaps it's worth looking for funding from agencies concerned with protecting the national heritage.

Machine learning and computer-aided tomography

I had lunch with Maia Neves, and we discussed the possibility of my joining in on a grant to apply his machine learning methods to CAT.

25th June 2000.

[ Jocelyn Ireson-Paine's Home Page ]