This is not a repeat from yesterday. Well, it is, just a different server 🙂
UPDATE 2013-08-08 2:23pm
/pb1 is now online, and should not fall over under heavy loads any more.
Have at it folks. Sorry it has taken this long to get to the final
resolution of this problem.
—- Earlier Post —-
Bad news:
If you haven’t been able to tell, the /pb1 filesystem has failed again.
Good news:
We’ve been working on a new load for the OS for all storage boxes
which we had hoped to get out on last maintenance day (July 17), but
ran out of time to verify whether it was
- deployable
- resolved the actual issue
Memo (Mehmet Belgin) greatly assisted me is testing this issue by finding some of the cases we’ve known to cause failures and replicating them against our test installs. Many loads were broken confirming our suspicions, and also confirming our new image. It will take heavy loads a LOT better than before.
With verification done, we have been planning to have all Solaris based storage switched to this by the end
of the next maintenance day (October 15).
However, due to need, this will be going on the PB1 fileserver is just a little bit. We’ve
verified the process of how to do this without impacting any data
stored on the server, so we anticipate having this fileserver back up
and running at 2:30PM, and the bugs which have been causing this
problem since April will have been removed.
I’ll follow up with progress messages.