What to do if a partition hang in error state?
Some times a partition hang in an error state and no new job can be started.
The output of llstat in this situation looks like:
<11> R10 13496 jzam0609 error 0 0 0 V 111 0.59 <12> R10 13498 jzam0609 error 0 0 0 V 111 0.50 Midplane usage: --------------- #nodes +------------------------------------------+ | R10-M1: <--> 0 | | R10-M0: <--> 0 | +------------------------------------------+
If a partition hangs in this error state the problem can be solved by initializing a reallocation of the partition with:
-
sched_bgl -b partition name
last change 07.04.2006 |
