[ Jocelyn Ireson-Paine's Home Page | Publications | More about Excelsior ]

How To Avoid Hours Of Tedious Spreadsheet Editing



Jocelyn Ireson-Paine and Emre Tek
www.j-paine.org/ and www.spreadsheet-factory.com/
popx@j-paine.org

Summary

This handout acompanies the EuSpRIG 2006 paper Rapid Spreadsheet Reshaping with Excelsior: multiple drastic changes to content and layout are easy when you represent enough structure, by Jocelyn Ireson-Paine, Emre Tek, and Duncan Williamson.

Last year, I talked about Excelsior. This is the world's first program for modularising spreadsheets, enabling them to be built from components which can be independently created, tested, debugged, and updated. This year, I want to show how Excelsior can slash the time and effort needed to restructure spreadsheets.

I'll illustrate with a commercial feasibility test carried out for social-housing consultants Weedon Grant. Emre Tek and I restructured a 10,000-cell housing-finance spreadsheet containing some 60 interconnected 20 × 40 tables. We generated new versions of this spreadsheet, with table sizes varying from 5 × 10 to 200 × 2,000. We also moved tables between sheets; and flipped their orientations. Each change generated a spreadsheet with different structure but identical outputs; and each change took but a few minutes.

I suggest that companies needing to comply with Sarbanes-Oxley should consider Excelsior. My work demonstrates that it is possible to store the essence of a spreadsheet as text, generating from this different sizes and layouts as needed. This simplifies version-control, there being fewer versions to control! Modularisation adds to this advantage; and because Excelsior programs are text, they can be stored in the high-quality version-control systems already developed for conventional programming languages.

Introduction

The people of Greenwich have bad hair. I know this, because last year's EuSpRIG conference was in Greenwich. All the bus shelters and other advertising spaces were plastered with shampoo posters depicting an unbeautiful lady resembling Medusa after an unsuccessful snake-do. Underneath was the caption "Get rid of your hair monsters!" I don't know why Greenwich needs such an advertising barrage: perhaps it's those raw salty estuarial winds. But it made me think that I, or perhaps EuSpRIG, should adopt as slogan: "Get rid of your spreadsheet monsters!"

Last year, I attacked the Monster of Non-Modularity. Imagine you're a multinational, with cashflow spreadsheets to make for departments in 5, 10, even 20 countries. All the spreadsheets have the same basic structure, but because tax laws differ, each spreadsheet must have its own tax calculations.

In most programming languages, you'd code this by defining a basic cashflow module into which can be plugged one of several interchangeable tax modules. But you can't do that in Excel. Instead, you are forced to build a complete new spreadsheet for each country. You end up with lots of related but different spreadsheets. Trying to keep them all in synch will eventually leave you pulling out your hair in frustration. My research is about overcoming this; and at last year's conference, I presented Excelsior, which I believe to be the first ever program for modularising spreadsheets.

This year, still with Excelsior, I attack the Monster of Tediously Repetitive Editing: he that has 1,000 eyes and an arm for every cell, and looks like Kali crossed with Argus crossed with Doctor Octopus. With so many eyes and arms, the M. of T.R.E. enjoys nothing more than recoding every cell in a spreadsheet just in case it has been invalidated by a previous change. I don't.

Weedon Grant, housing associations, and the need for bespoke spreadsheets

I did this work in collaboration with Emre Tek of Weedon Grant. He was looking for tools to modularise spreadsheets, so I'll explain a bit about that first, before getting on to the avoidance of tediously repetitive editing.

Weedon Grant's clients are councils and housing associations who manage "social" or "affordable" housing. These clients' main recurrent income is from tenants' rents. The clients' costs include: repairs and maintenance; buying or building new housing and demolishing housing that can no longer be maintained; and tax.

Weedon Grant sell spreadsheet-based models to these clients. The models forecast business performance over typically 40 years, given assumptions about such factors as: expected number of dwellings rented to tenants in each year; number sold, demolished or built each year; yearly rent increases; repair and maintenance costs; Government grants and other funding; and tax.

What has this to do with modularity? Different housing clients want different models. One, perhaps, does not intend to build any new housing after its first year: it would therefore like all the "new housing" columns eliminated from its model. Another would like to use its own repair-costs spreadsheet, plugging this into Weedon Grant's model in place of that model's repair-cost calculations. And another has a special source of funding not catered for by the generic Weedon Grant model, and therefore asks Weedon Grant to code a special set of calculations to take the place of those normally provided.

Such situations are exactly where modularity is useful, and are why Emre contacted me to see whether Excelsior could speed up and make safer the development of such bespoke models.

The need for easy-to-change table sizes

So that's why Weedon Grant want modularity. But there's another problem. Consider one of their spreadsheets, the "stock model", with which I give my demo at EuSpRIG 2006. This has about 10,000 cells. Of these, 9,000 lie within tables. About 60 tables are two-dimensional. One dimension is year, the other is dwelling type, and the tables record quantities such as unit rent, number of tenants, or total income.

The standard version of the stock model caters for 20 dwelling types and 40 years, so each of these 60 tables is 20 cells by 40. However, some clients request different sizes. Clients who manage only one or two types of dwelling would like to remove the remaining dwelling-type columns in each table, to save memory and reduce the need for horizontal scrolling. On the other hand, clients with more than 20 dwelling types would like to expand the tables. Too, some would like to forecast over more than 40 years.

This can be done in Excel. And we all know how much work that is …

Demonstration

At this point in my EuSpRIG 2006 talk, I give a demo. I show a "stock model" spreadsheet as originally written, with its 20 by 40 tables. I then start up a program written using Excelsior, which generates new spreadsheets. It has four menus. One selects the file where the Excelsior version of the spreadsheet is stored. One selects a "layout grid" file which specifies the spreadsheet's layout. One selects number of years, and defaults to 40. And one selects number of dwelling types, and defaults to 20.

In the first part of the demo, I generate new spreadsheets with different numbers of years and dwelling types. I do this just by selecting the appropriate numbers from the menus and pressing a "Generate" button. All the tables in the new spreadsheets have the new sizes, but other properties of the spreadsheets, including the location of captions relative to tables and the spacing between tables, remain the same. The time taken by each change is that needed to select the numbers and press "Generate", plus the time Excelsior takes to generate the new Excel file.

In the second part of the demo, I generate new spreadsheets with significantly different layouts from the original. For example: tables and captions moved from one sheet to another; tables oriented with years running across instead of down; rows, columns, and captions inserted or deleted. I do this by editing the layout grid and then pressing "Generate".

How to use Excelsior

An Excelsior program has two main components. The first is a set of text files specifying the calculations to be done. These are described in terms of named tables together with formulae. For example, here is a program that creates a Year table and a Lettings table, and gives a value to each element of the Year table:

  {# Year[ 1:3 ]
   , Lettings[ 1:2, 1:3 ]
   | Year[ all y ] = y+2002
  #}

The second component is the "layout grid" which describes how these tables are to be laid out in a spreadsheet. This works like a format in a conventional programming language, except that it has more than one dimension. In fact, it is a set of sheets, each sheet corresponding to one sheet in the generated spreadsheet. To generate that spreadsheet, Excelsior treats text in quotes as captions. Unquoted text, it assumes to be tables: you will see that the grid mentions both Year and Lettings. These names, it replaces by the formulae from the Excelsior code. Should those formulae refer to tables, it translates the references to the corresponding cells. The orientation of each table is specified by a descriptor such as "x", "y", "xy", or "yx". For example, a one-dimensional table with descriptor "x" is laid out running along a row, i.e. in the x direction. With the descriptor "y", it would be laid out down a column:
Sheet Lets
"EXCELSIOR DEMO"    
"Flats Let"    
  "Year" "Flat type"
  Year y Lettings xy

Here is the spreadsheet generated from the above two components. To make it more realistic, I've typed some data into the Lettings cells:
Sheet Lets
EXCELSIOR DEMO      
Flats Let      
  Year Flat type  
  2003 14 92
  2004 15 64
  2005 18 12

Where do Excelsior programs come from?

There are two ways to make an Excelsior program. The first is to write it from scratch. This is what Weedon Grant hope to do.

The second is to generate it from an existing spreadsheet. To do this usefully, one has to discover how cells in the spreadsheet are grouped into tables. Without this information, Excelsior doesn't know how to resize these groups any more then Excel does. I call the process of generating an Excelsior program from a spreadsheet "structure discovery"; an example is given in my paper.

How do I implement resizable tables?

Technically speaking, this is done by making the Excelsior program into a function. An Excelsior program such as that above actually denotes something that I call an "object": essentially, a set of tables and formulae. We can make this object into an "object-valued function", and give it parameters that determine the size of its tables. Here's an example:

  let MyModel( NYears, NDwellingTypes ) be
  {# Year[ 1:NYears ]
   , Lettings[ 1:NDwellingTypes, 1:NYears ]
   | Year[ all y ] = y+2002
  #}

What I've done is to replace the size costants by variables, and include these variables in the added first line, the function heading. The clever part is that this function can be called with any size of arguments, and yet will still work with the layout grid. Part of my research was devising a form of layout specification that was independent of size: the layout grid is it.

For spreadsheeters who don't want to write Excelsior code, we are working on graphical user interfaces and closer links to Excel.

How do I break a program into modules?

Excelsior has operations for merging or combining programs together into bigger programs. These are described in my EuSpRIG 2006 paper; and more fully in my EuSpRIG 2005 paper, Excelsior: bringing the benefits of modularisation to Excel.

Excelsior and Sarbanes-Oxley

I suggest those needing to comply with Sarbanes-Oxley should consider Excelsior. My work shows that it is possible to store the essence of a realistically large and complicated spreadsheet as text, generating from this different sizes and layouts as needed. Only one version of the text need be stored, rather than one for each size and layout. This makes version control much simpler, there being fewer versions to control!

Modularisation adds to this advantage. With Excelsior, and returning to Weedon Grant as an example, it isn't necessary to store both a spreadsheet built from StandardHousingFinance plus StandardRepairCosts, and one built from StandardHousingFinance plus MyBespokeRepairCosts. Instead, you store the three modules StandardRepairCosts, StandardHousingFinance, and MyBespokeRepairCosts. When you want to generate a complete spreadsheet, you do so there and then using the module-composition operators, ship it, and then simply forget it.

Sarbanes-Oxley requires companies to be meticulous about recording changes to financial software. The features of Excelsior noted above should make this easier because there will be fewer changes to manage, and because each spreadsheet module need be stored only once, in some well-auditable central repository. Furthermore, because modules are stored as text, they can be handled by the high-quality version-control systems already developed for other programming languages; something not possible for spreadsheets stored as Excel code.

By the way, I undertake spreadsheet development using Excelsior. Contact me at popx@j-paine.org.

[ Jocelyn Ireson-Paine's Home Page | Publications | More about Excelsior ]