Checkpoint and restart facilities
We show how the automatic checkpointing facilities of the cluster can be used to automatically save the state of a program and facilitate a restart from that state. This is useful, e.g., for programs that have to run for such a long time that they don't fit in the available queue time slots.