Sunday, 10 February 2008

The power of doing nothing....

There's an ongoing discussion on COCAN on how to make errors explicit in function signatures (the proposed solution is to go exceptionless). I will not delve on why I do not think this is a standard that should be adopted. Instead we will explore a very lightweight solutions that preserves the stack unwinding capability of exceptions whilst making the possible errors explicit in the signatures of the function.

Basic types

When one of our function is applied there we can have either one of two possible outcomes: everything goes right or we are left over with an error (notice the cheap mnemonics). Now if we get rid of exceptions we want our return types to be able all this information. To do so we mimics Haskell's either type:

type ('a,'b) either = Left of 'a | Right of 'b

In case you haden't caught my subtle hints Right is used to hold results and Left to hold errors.

For our short example we shall have a limited number of possible errors:

type possible_exc = [`Too_Small|`Too_Big|`Div_by_zero|`Some_other_error]

that can be thrown in an exception:

exception MyExc of [< possible_exc]

(Allthough I may not be using any unsafe features of OCaml I am actually exploiting a bug in the type-checker here. The absence of this hack would make one of our functions a little less precise).

External interface

Ideally we'd want to choose functions should raise an exception or return a wrapped value. This choice has to be made at function application. Bearing this in mind our external interface is:

module type External = sig
  type ('a,'b) value constraint 'a = [< possible_exc ]
  val run : ('a -> ('b,'c) value) -> 'a -> 'c
  val run_exn : ('a -> ('b,'c) value) -> 'a -> ('b,'c) either
end

Internal interface

The idea behind this implementation is to use a phantom type attached to a value we shall pass along. This phantom type will be unified with every single single value we might raise thus acting as a "collector" for all the possible exceptions. Without further ado let proceed:

module type Internal = sig
  type 'a t constraint 'a = [< possible_exc ]
  type ('a,'b) value constraint 'a = [< possible_exc ]
  val create : unit -> 'a t
  val raise : 'a t -> 'a -> 'b
  val return : 'a t -> 'b -> ('a,'b) value
  val lift : 'a t -> ('a,'b) value -> 'b
end

The type t is our collector. It can only be created via the create function. raise and return are rather self explanatory, lift might come as more of a surprise: it is used to get the returned values from one our function. It ensures the possible errors of the called functions are unified with the "collector" of the callee. The internal interface does not expose functions like External.run because it discards the possible error cases.

Implementation

Our interfaces being set we will now present an implementation. It has been stated at the beginning of this post that we want it as lightweight as possible. Since we will (ab)use the identity functions let's get it out of the way:

let ident = fun i -> i

We are not using the more efficient external "%identity" function because it is, in essence, not type safe. If we consider that the unit value is a value that carries no information (the unit type being inhabited by only one value) and the identity function is a function that does nothing our implementation really doesn't do much:

module Impl = struct
  type 'a t = unit constraint 'a = [< possible_exc ]
  type ('a,'b) value = 'b constraint 'a = [< possible_exc ]
  let create = ident
  let raise () e = raise (MyExc e)
  let return () = ident
  let run = ident
  let lift () = ident
  let run_exn f v = try Right (f v) with  MyExc v -> Left v
end

Excercising it

Let's take our shiny new modules for a test run. The types commented are the inferred types and thus the most general ones.

module I:Internal = Impl

(* int -> ([< possible_exc > `Too_Big `Too_Small ], unit) I.value *)
let check_range i =
  let t = I.create () in
  if i > 5 then
    I.raise t `Too_Big
  else if i < -5 then
    I.raise t `Too_Small
  else
    I.return t ()

(* 'a -> int -> ([< possible_exc > `Div_by_zero ], int) I.value *)
let div x y =
  let t = I.create () in
  if y = 0 then
    I.raise t `Div_by_zero
  else
    I.return t 0

(*
 int ->
 int ->
 ([< possible_exc > `Div_by_zero `Too_Big `Too_Small ], int) I.value
*)
let div_in_range x y =
  let e = I.create () in
  I.lift e (check_range x);
  I.lift e (check_range y);
  I.return e (I.lift e (div x y))

Conclusion

Even though this actually works it does not scale well to multiple modules. It should be considered as a proof of concept rather than serious code.

As there name imply exception should be used to handle exceptional cases. A well designed library should only throw exception when it encounters an error or a bug. Exception should only be used for error recovery (they might be used internally for hacks such as bugtracking but should never trickle out of the library). It seems cumbersome to impose the callee of library functions to handle a long trail of more and more exotic error cases.

If one really desires to have explicit error handling combining our either type with monads seems to be the sane route. This is exactly what the haskell error monad does.

Talking about haskell, an interesting read is 8 ways to report errors in Haskell. It gives a good overview on how messy and uncoherent error reporting can get.

Tuesday, 27 November 2007

Bilingual "hello world"

Here is a fun (and slightly useless) hack:

#cd "."(*
echo "Hello world"
<<"OCAMLCODE_END"
*)
let () = print_endline "Bonjour le monde"
(*
OCAMLCODE_END
#*)

This program is both a shell program and an ocaml program. If you run it using sh it will print "Hello world" but if you run it in ocaml it will output "Bonjour le monde" (Ocaml is a french programming language after all).

There is actually a small interest in this hack: suppose you want to run an ocaml script but need to make a couple of checks before running it (for instance checking whether findlib is installed or the interpreter is recent enough) you can now bundle it as a shell executable that calls itself again after having done the checks as an ocaml program.

Sunday, 24 June 2007

Preserving atomicity in IO operations

[Updated 26/07/07: unwind_protect now captures less variables.]

There are a bunch of operation that must be executed in pairs, for instance openned channel SHOULD be closed. That is: every call to an open_in on a file should be followed by a subsequent close_in on the openned channel.

Edging towards a solution:

Lispers actually have a neat way atomicity of file descriptor operations. with-open-file

with_open_file takes the name of the file to and a function working on the file handle, this function should not close the file handle. A first shot would look like:

let with_open_in file f=
 let ic=open_in file in
 let res=f ic in
 close_in ic;
 res

Although at a first glance this looks ok it will break down if an exception is raised in f. We will now introduce a new function from the lisp world. unwind-protect

Unwind-protect:

unwind_protect takes two functions, the second one being a cleanup function. unwind_protect f cleanup returns the result of running (). Whatever happens in (), cleanup () will be called.

let unwind_protect f g=
 let run f ()=
  match !with
   | Some f -> f ()
   | None -> ()
 in
 let closeFun=ref (Some g) in
 at_exit (run closeFun);
 let res=
  try
   f ()
  with e ->
   g ();
   raise e
 in
 closeFun := None;
 g ();
 res

with_open_file can now be coded as:

let with_open_in filename f=
 let ch=open_in filename in
 unwind_protect (fun () -> f ch) (fun () -> close_in ch)

Wrapping it up:

We now would like to force the usage of our new functions instead of the old ones. We do not want to define a new type of channel and there is no way to 'hide' them from Pervasives, we can however override the functions we don't want to allow with an abstract type:

module Abstract:sig
 type t
 val v:t
end
=
struct
 type t=unit
 let v=()
end
let open_out=Abstract.v
let open_in=Abstract.v
let close_out=Abstract.v
let close_in=Abstract.v

Conclusion:

This looks like yet another modification one could wish for in OCaml standard library.

Sunday, 3 June 2007

Phun with phantom types!!

Phantom types are a nifty trick: types are used to store additional information during the type-checking pass. These types have no implementations (there are no values actually having these types) but are still used as type parameter to tag values. This additional info is then used by the type system to statically ensure some conditions are met. As, I'm guessing this is all getting rather intriguing (or confusing) I propose to step through a very simple example.

Without phantom types

Let's start out with a very basic library to handle lists:

(*The empty list*)
let nil=[]
(*Appends an element in front of a list*)
let cons e l = e::l
(*Converts two list of same sizes to a list of couples *)
let combine = List.combine

Combine needs both of its arguments to be of the same length. This is typically a job for dependent types (i.e. types depending on a value) where list length would be encoded in their types. Ocaml doesn't have dependant type but we'll see how to leverage the type inference mechanism to encode lengths.

Encoding the length

Since our types cannot contain values we need to find a way to code integers in our type system. We will using an encoding based on Peano's axiom's:

type zero
type 'length succ

0 is represented by the type zero, 1 by succ zero, 2 by succ succ zero etc... There exist no values having these types: they are empty.

Using the phantom type

The previous type will be the "phantom type": it will be used to encode additional info but won't represent the type of any actual object.

The idea here is to make our lists depend on that type to store the length info. Instead of being 'a list our list type is now:

type ('a,'length) t

where 'length represents the length of the list. Giving types to our previous functions is now straightforward:

val nil:('a,zero) t
val cons:'a -> ('a,'length) t -> ('a,('length succ)) t
val combine:('a,'length) t -> ('b,'length) t -> (('a*'b),'length) t

and since under the hood we are using standard OCaml's list, converting from our list to a normal list is a plain identity. We'll now wrap everything in a nice module in order to hide the internals:

module DepList:
sig
 type zero
 type 'length succ
 type ('a,'length) t
 val nil:('a,zero) t
 val cons:'a -> ('a,'length) t -> ('a,('length succ)) t
 val toList:('a,'length) t -> 'a list
 val combine:('a,'length) t -> ('b,'length) t -> (('a*'b),'length) t
end
 =
struct
 type zero
 type 'b succ
 type ('a,'length) t='a list
 let nil=[]
 let cons e l = e::l
 let combine = List.combine
 let toList l = l
end

Testing it all

And it's now time to play with our library...

open DepList;;
let a=cons 5 (cons 6 nil);;
val a : (int, DepList.zero DepList.succ DepList.succ) DepList.= <abstr>
# toList a;;
- : int list = [5; 6]
let b=cons "a" nil;;
val b : (string, DepList.zero DepList.succ) DepList.= <abstr>
# combine a b;;
Characters 10-11:
  combine a b;;
        ^
This expression has type (string, DepList.zero DepList.succ) DepList.t
but is here used with type
  (string, DepList.zero DepList.succ DepList.succ) DepList.t

That's right we've just statically caught an error because combine was called with two lists of different lengths!

Conclusion

Phantom types are a fun hack to play with, alas they are very restrictive and rarely useful. Their big brothers (GADT's and dependant types) require specific type systems and are tricky to groke.