[ Jocelyn Ireson-Paine's Home Page | Publications | Dobbs Code Talk Index | Dobbs Blog Version ]

Snobol Patterns in Prolog III: Sharing Code with Higher-Order Programming

Two Sundays ago, I walked with my friend Hendrik Hilberdink along the Oxford canal and into Kidlington. As we wandered along Kidlington's stunningly tedious main street, I pointed out the gym that I use. Oxford prices being what they are, it's cheaper than ones nearer home. But I haven't visited it recently, because I reckon I've expended enough calories letting my fingers do the walking. They've been typing recursive list decompositions and base cases in Prolog, pounding out the same old boring boilerplate code over and over and over again. In this essay, I'll show you how to eliminate the boilerplate by using higher-order programming: functions that take other functions as arguments; or in Prolog, predicates that take other predicates as arguments. I have experimented with it to share code between the "span" and "span_count" patterns that I talked about in Snobol Patterns in Prolog II: Span with Count.

If you look back at my essay Arbno, the Cursor, and Snobol Patterns in Prolog, you will see that the "span_count" from Snobol Patterns in Prolog II is a simple variation on the "span" I showed there. So is there a single abstraction with which I could implement both? Specifically, can I write a higher-order predicate with which I could implement both?

In case you're not used to higher-order programming, there are a number of examples in Lee Naish's paper Higher-order logic programming in Prolog. I'll give a simple example using an SWI-Prolog built-in, the maplist/2 predicate:

10 ?- maplist( write, [a,b,c,d] ).
abcd
true.
This applies "write" to every element of [a,b,c,d], generating the output shown.

As another example, assume that I want to convert every atom in a list to a list of its character codes. The built-in predicate name/2 will do this for one atom. But using SWI's maplist/3, I can iterate name/2 over a list of atoms:

11 ?- maplist( name, [a,b,c,d], Result ).
Result = [[97], [98], [99], [100]].

Such higher-order calls avoid the need for us to code explicit recursive loops. You know the kind of thing:

write_list( [] ) :- !.

write_list( [ H | T ] ) :-
  write( H ),
  write_list( T ).

?- write_list( [a,b,c,d] ).
And we decide we needn't visit the gym today, because our fingers have done enough walking, pounding out that same old boring boilerplate over and over and over again. Because if I've coded write_list as above, and I want another write_list that calls display/1 instead of write/1 , then I have to type another five lines, almost but not quite identical with write_list's. But given "maplist", I need only pass it "display" rather than "write":
12 ?- maplist( write, [ a, -b, c+d, g is e/f ] ).
a-bc+dg is e/f
true.

13 ?- maplist( display, [ a, -b, c+d, g is e/f ] ).
a-(b)+(c, d)is(g, /(e, f))
true.

At this point, I'm going to introduce a way to write predicates without having to name them. Because, suppose I want to output a newline after each list element. If we had a "write_line" predicate that wrote its argument and then output a newline, I could just pass that to maplist.

But we don't have a "write_line" predicate. I could, of course, define one. But I'd prefer not to clog up my source code with tiny little helper predicates. So instead, I'm going to borrow a trick pointed out by Joachim Schimpf in a 2004 SWI-Prolog mailing Re: [SWIPL] "anonymous" predicates in maplist and friends? It's to imitate the style of definition used by functional languages for anonymous functions, implementing it as follows:

lambda( A1, GoalTemplate, X1 ) :-
  copy_term( A1-GoalTemplate, X1-Goal ),
  call( Goal ).


lambda( A1, GoalTemplate ) :-
  copy_term( A1-GoalTemplate, _X1-Goal ),
  call( Goal ).

And here are some calls:

14 ?- call( lambda(A,write(A)), a ).
a
true.

15 ?- maplist( lambda(A,(write(A),nl)), [ a, -b, c+d, g is e/f ] ).
a
-b
c+d
g is e/f
true.

16 ?- maplist( lambda(A,format('"~w"~n',[A])), [ a, -b, c+d, g is e/f ] ).
"a"
"-b"
"c+d"
"g is e/f"
true.

17 ?- maplist( lambda(A,write(A)), [ a, -b, c+d, g is e/f ] ).
a-bc+dg is e/f
true.

18 ?- maplist( lambda(A,true), [ a, -b, c+d, g is e/f ] ).
true.

19 ?- maplist( lambda(X,(Y is 2*X,write(Y))), [ 1, 2, 3, 4 ] ).
2468
true.

Schimpf's "lambda" invokes the built-in predicate copy_term/2, for much the same reason that my variants of "arbno" do in Arbno, the Cursor, and Snobol Patterns in Prolog. The best way to see why it works is to dry-run one of the calls just above, following the variable bindings it creates.

The maplist predicates aren't the only higher-order ones we can write. Two that functional programmers know well are "foldl" and "foldr", which apply a binary operator to adjacent elements of a list. For example, folding addition over the list adds its elements.

This is explained in Wikipedia's Fold (higher-order function), which shows how "foldl" and "foldr" differ. For Prolog, Naish gives examples in Higher-order logic programming in Prolog. And I'll just add that in Lambda the Ultimate, Noel Welsh pointed out the Web sites foldl.com and foldr.com! These are "two fun websites that may just help you wrap your head around left and right folds ... the product of Oliver Steele, designer of (Open)Laszlo". When I tried foldl.com, I got an anti-phishing warning, but foldr.com still works.

I sometimes use the following implementations of foldl and foldr:

/*  foldl( +Op, +Zero, +List, -Result ):
       foldl( Op, Zero, [E1..En], Result )
       calculates Result to be
       (En Op (En-1 Op (...E1 Op Zero))).
*/
foldl( Op, Zero, List, Result ) :-
  foldl( Op, Zero, List, Zero, Result ).


foldl( _, _, [], A, A ) :- !.

foldl( Op, Zero, [X|Xs], A0, A ) :-
  call( Op, X, A0, A1 ),
  foldl( Op, Zero, Xs, A1, A ).


/*  foldr( +Op, +Zero, +List, -Result ):
       foldr( Op, Zero, [E1..En], Result )
       calculates Result to be
       (E1 Op (E2 Op (...En Op Zero))).
*/
foldr( _, Zero, [], Zero ) :- !.

foldr( Op, Zero, [A|As], Result ) :-
  foldr( Op, Zero, As, Rs ),
  call( Op, A, Rs, Result ).


foldr( _, _, Zero, [], Zero ) :- !.

foldr( Op, F, Zero, [A|As], Result ) :-
  foldr( Op, F, Zero, As, Rs ),
  call( F, A, FA ),
  call( Op, FA, Rs, Result ).


lambda( A1, A2, GoalTemplate, X1, X2 ) :-
  copy_term( A1-A2-GoalTemplate, X1-X2-Goal ),
  call( Goal ).

The lambda/5 at the end is a variant of Schimpf's; it creates anonymous predicates with three arguments. You can see it here, where I calculate the sum and product of a list of numbers:

20 ?- call( lambda( X, Y, Z, Z is X+Y ), 1, 2, Result ).
Result = 3.

21 ?- foldr( lambda( X, Y, Z, Z is X+Y ), 0, [1,4,5,3,2], Result ).
Result = 15.

22 ?- foldr( lambda( X, Y, Z, Z is X*Y ), 1, [1,4,5,3,2], Result ).
Result = 120.

So now I want to get back to "span" and span_count. Unsurprisingly, both predicates recurse in the same way over the string they're matching, but span_count has one more argument. Can I define both in terms of foldl, or foldr, or some other higher-order predicate?

I couldn't see how to do so in terms of foldl or foldr, though I may just be not seeing something obvious. But I then tried generalising "span" in a mindlessly automatic way. I did so by replacing the call to "any" by a call to a predicate, P1, passed as an argument: it's the "any(Chars)" passed to segfold/4 in the source that I'm about to show you.

Similarly, I replaced the calculation of the matched string — the code that prepends the C matched by "any" onto the substring matched by "span_tail" — by a call to a predicate, Op, passed as another argument. This is the

lambda( [X], Y, Z, Z=[X|Y] )
passed to segfold/4 in the same source.

I also created an argument position to hold the "zero" value for the matched string, namely the [] passed to segfold/4.

I did the same for span_count, then made arguments consistent between the generalised "span" and the generalised span_count, then renamed variables to be consistent with my fold and foldr. This gave me the following code. In it, I've named the equivalents of "span" and "span_count" "ho_span" and "ho_span_count". This avoids a clash with the original span and span_count. The "ho_" stands for "higher order" (meaning that I implemented them using higher-order programming, not that they are themselves higher order). The higher-order predicate that I generalised "span" and span_count to, I've named "segfold", because it seems to do a fold over segments of a list rather than individual elements. Here is the source:

ho_span( Chars ) -->
  ho_span( Chars, _Matched ).


ho_span( Chars, Matched ) -->
  segfold( any(Chars)
         , lambda( [X], Y, Z, Z=[X|Y] )
         , []
         , Matched
         ).


ho_span_count( Chars, Len ) -->
  ho_span_count( Chars, Len, _Matched ).


ho_span_count( Chars, Len, Matched ) -->
  segfold( any_len(Chars)
         , lambda( [X1]-X2, Y1-Y2, Z1-Z2
                 , ( Z1=[X1|Y1]
                   , Z2 is X2 + Y2
                   )
                 )
         , []-0
         , Matched-Len
         ).


any_len( Chars, M-1 ) -->
  any( Chars, M ).


segfold( P1, Op, Zero, Res, In, Out ) :-
  call( P1, HeadRes, In, InAfterP1 ), !,
  segfold_tail( P1, Op, Zero, TailRes, InAfterP1, Out ),
  call( Op, HeadRes, TailRes, Res ).


segfold_tail( P1, Op, Zero, Res ) -->
  segfold( P1, Op, Zero, Res ), !.

segfold_tail( _P1, _Op, Zero, Zero ) -->
  [].

In this code, the lambda-expression that calculates the matched string for "ho_span" prepends the character matched by "any" onto the rest of the matched string, as I've said. The lambda-expression in "ho_span_count" does this too, but it also updates the value of the length of this string. So it carries out a pair of calculations. This is why its body has two goals. It is also why each argument is a pair of terms joined by "-". The first element of each pair is about the matched string, abd the second element is about its length. I'm not sure how readable this makes the code...

The "any_len" predicate is like "any", but also returns the length of the string it matches. This is always 1, but it might not be were I to replace "any" by something else. Despite having "lambda", I defined "any_len" as a non-anonymous predicate: that seemed easier to understand.

To conclude: I've derived, by rather mindless symbol-shuffling, a higher-order predicate for iterating over lists being matched by DCG rules, and used it to implement two useful patterns. I can't quite see whether my higher-order predicate can be reduced to the well-known foldl or foldr. And I'm not sure whether it is the best way of expressing what it does. But it saves typing, and enables me to reuse code.