One of the many benefits of using PACE clusters is the scratch storage, which provides a fast filesystem for I/O-bound jobs. The scratch server is designed to offer high speeds but not so much storage capacity. So far, a weekly script that deletes all files older than 60 days had allowed us sustain this service without the need for disk quotas. However this situation started changing as the PACE clusters had grown to a whopping ~750 active users, with the addition of ~300 users only since Feb 2011. Consequently, it became common for the scratch utilization to reach 98%-100% on several volumes, which is alarming for the health of the entire system.
We are planning to address this issue with a 2-step transition plan for enabling file quotas. The first step will be applying 10TB “soft” quotas for all users for the next 3 months. A soft quota means that you will receive warning emails from the system if you exceed 10TB, but your writes will NOT be blocked. This will help you adjust your data usage and get prepared for the second step, which is the 10TB “hard” quotas that will block writes when the quota is exceeded.
Considering that the total scratch capacity is 260TB, a 10TB quota for 750 users is a very generous limit. Looking at some current statistics, the number of users using more than this capacity does not exceed 10. If you are one of these users (you can check using the command ‘du -hs ~/scratch’) and have concerns that the 10TB quota will adversely impact your research, please contact us (pace-support@oit.gatech.edu).