Tuesday, 27 November 2007

Bilingual "hello world"

Here is a fun (and slightly useless) hack:

#cd "."(*
echo "Hello world"
<<"OCAMLCODE_END"
*)
let () = print_endline "Bonjour le monde"
(*
OCAMLCODE_END
#*)

This program is both a shell program and an ocaml program. If you run it using sh it will print "Hello world" but if you run it in ocaml it will output "Bonjour le monde" (Ocaml is a french programming language after all).

There is actually a small interest in this hack: suppose you want to run an ocaml script but need to make a couple of checks before running it (for instance checking whether findlib is installed or the interpreter is recent enough) you can now bundle it as a shell executable that calls itself again after having done the checks as an ocaml program.

Sunday, 24 June 2007

Preserving atomicity in IO operations

[Updated 26/07/07: unwind_protect now captures less variables.]

There are a bunch of operation that must be executed in pairs, for instance openned channel SHOULD be closed. That is: every call to an open_in on a file should be followed by a subsequent close_in on the openned channel.

Edging towards a solution:

Lispers actually have a neat way atomicity of file descriptor operations. with-open-file

with_open_file takes the name of the file to and a function working on the file handle, this function should not close the file handle. A first shot would look like:

let with_open_in file f=
 let ic=open_in file in
 let res=f ic in
 close_in ic;
 res

Although at a first glance this looks ok it will break down if an exception is raised in f. We will now introduce a new function from the lisp world. unwind-protect

Unwind-protect:

unwind_protect takes two functions, the second one being a cleanup function. unwind_protect f cleanup returns the result of running (). Whatever happens in (), cleanup () will be called.

let unwind_protect f g=
 let run f ()=
  match !with
   | Some f -> f ()
   | None -> ()
 in
 let closeFun=ref (Some g) in
 at_exit (run closeFun);
 let res=
  try
   f ()
  with e ->
   g ();
   raise e
 in
 closeFun := None;
 g ();
 res

with_open_file can now be coded as:

let with_open_in filename f=
 let ch=open_in filename in
 unwind_protect (fun () -> f ch) (fun () -> close_in ch)

Wrapping it up:

We now would like to force the usage of our new functions instead of the old ones. We do not want to define a new type of channel and there is no way to 'hide' them from Pervasives, we can however override the functions we don't want to allow with an abstract type:

module Abstract:sig
 type t
 val v:t
end
=
struct
 type t=unit
 let v=()
end
let open_out=Abstract.v
let open_in=Abstract.v
let close_out=Abstract.v
let close_in=Abstract.v

Conclusion:

This looks like yet another modification one could wish for in OCaml standard library.

Sunday, 3 June 2007

Phun with phantom types!!

Phantom types are a nifty trick: types are used to store additional information during the type-checking pass. These types have no implementations (there are no values actually having these types) but are still used as type parameter to tag values. This additional info is then used by the type system to statically ensure some conditions are met. As, I'm guessing this is all getting rather intriguing (or confusing) I propose to step through a very simple example.

Without phantom types

Let's start out with a very basic library to handle lists:

(*The empty list*)
let nil=[]
(*Appends an element in front of a list*)
let cons e l = e::l
(*Converts two list of same sizes to a list of couples *)
let combine = List.combine

Combine needs both of its arguments to be of the same length. This is typically a job for dependent types (i.e. types depending on a value) where list length would be encoded in their types. Ocaml doesn't have dependant type but we'll see how to leverage the type inference mechanism to encode lengths.

Encoding the length

Since our types cannot contain values we need to find a way to code integers in our type system. We will using an encoding based on Peano's axiom's:

type zero
type 'length succ

0 is represented by the type zero, 1 by succ zero, 2 by succ succ zero etc... There exist no values having these types: they are empty.

Using the phantom type

The previous type will be the "phantom type": it will be used to encode additional info but won't represent the type of any actual object.

The idea here is to make our lists depend on that type to store the length info. Instead of being 'a list our list type is now:

type ('a,'length) t

where 'length represents the length of the list. Giving types to our previous functions is now straightforward:

val nil:('a,zero) t
val cons:'a -> ('a,'length) t -> ('a,('length succ)) t
val combine:('a,'length) t -> ('b,'length) t -> (('a*'b),'length) t

and since under the hood we are using standard OCaml's list, converting from our list to a normal list is a plain identity. We'll now wrap everything in a nice module in order to hide the internals:

module DepList:
sig
 type zero
 type 'length succ
 type ('a,'length) t
 val nil:('a,zero) t
 val cons:'a -> ('a,'length) t -> ('a,('length succ)) t
 val toList:('a,'length) t -> 'a list
 val combine:('a,'length) t -> ('b,'length) t -> (('a*'b),'length) t
end
 =
struct
 type zero
 type 'b succ
 type ('a,'length) t='a list
 let nil=[]
 let cons e l = e::l
 let combine = List.combine
 let toList l = l
end

Testing it all

And it's now time to play with our library...

open DepList;;
let a=cons 5 (cons 6 nil);;
val a : (int, DepList.zero DepList.succ DepList.succ) DepList.= <abstr>
# toList a;;
- : int list = [5; 6]
let b=cons "a" nil;;
val b : (string, DepList.zero DepList.succ) DepList.= <abstr>
# combine a b;;
Characters 10-11:
  combine a b;;
        ^
This expression has type (string, DepList.zero DepList.succ) DepList.t
but is here used with type
  (string, DepList.zero DepList.succ DepList.succ) DepList.t

That's right we've just statically caught an error because combine was called with two lists of different lengths!

Conclusion

Phantom types are a fun hack to play with, alas they are very restrictive and rarely useful. Their big brothers (GADT's and dependant types) require specific type systems and are tricky to groke.