Sunday, 24 June 2007

Preserving atomicity in IO operations

[Updated 26/07/07: unwind_protect now captures less variables.]

There are a bunch of operation that must be executed in pairs, for instance openned channel SHOULD be closed. That is: every call to an open_in on a file should be followed by a subsequent close_in on the openned channel.

Edging towards a solution:

Lispers actually have a neat way atomicity of file descriptor operations. with-open-file

with_open_file takes the name of the file to and a function working on the file handle, this function should not close the file handle. A first shot would look like:

let with_open_in file f=
 let ic=open_in file in
 let res=f ic in
 close_in ic;
 res

Although at a first glance this looks ok it will break down if an exception is raised in f. We will now introduce a new function from the lisp world. unwind-protect

Unwind-protect:

unwind_protect takes two functions, the second one being a cleanup function. unwind_protect f cleanup returns the result of running (). Whatever happens in (), cleanup () will be called.

let unwind_protect f g=
 let run f ()=
  match !with
   | Some f -> f ()
   | None -> ()
 in
 let closeFun=ref (Some g) in
 at_exit (run closeFun);
 let res=
  try
   f ()
  with e ->
   g ();
   raise e
 in
 closeFun := None;
 g ();
 res

with_open_file can now be coded as:

let with_open_in filename f=
 let ch=open_in filename in
 unwind_protect (fun () -> f ch) (fun () -> close_in ch)

Wrapping it up:

We now would like to force the usage of our new functions instead of the old ones. We do not want to define a new type of channel and there is no way to 'hide' them from Pervasives, we can however override the functions we don't want to allow with an abstract type:

module Abstract:sig
 type t
 val v:t
end
=
struct
 type t=unit
 let v=()
end
let open_out=Abstract.v
let open_in=Abstract.v
let close_out=Abstract.v
let close_in=Abstract.v

Conclusion:

This looks like yet another modification one could wish for in OCaml standard library.

3 comments:

Anonymous said...

Neat. I am new to OCaml and missed this; it's a common Ruby pattern.

What optimization did you make to reduce the variable capture?

Till said...

I do not fully recall (I should put my blog posts under a VCS). I believe I was using a boolean to check whether the cleanup function should be launched on exit.
By using an option type the only variable that gets captured is a ref to None. Pretty lightweight.

This code is a bit odd and many things could be said.
_You could consider that no cleaning should be done on exit
_The exit function could arguably use a specific exception thus unwinding the stack when it is called.
_It is a shame that at_exit does not provide a mean to deregister functions. The GC module provides a fairly elegant way (providing a value that can be used to do the deregistering).
This is a general issue with OCaml: the language is great but it is plagued by a makeshit standard library of very uneven quality.
It is very easy to blame the INRIA but making a proper standard library is a lot of work and can make or kill a language.

Chi said...

This is gorgeous!