Institutional repositories – are they a waste of money?

Everyone seems to have their own repository these days. Not only groups of interested researchers (e.g. and individual institutions (e.g. Southampton University) but even departments within universities (e.g. University of Oxford Mathematical Institute). I can understand many of the arguments for such repositories. However I wonder if developing such large numbers of repositories is truly the right way forward.

At the recent RIN seminar on the future of scholarly publishing there were interesting talks on UKPMC and the University of Liverpool. UKPMC is doing some innovative things: building good discovery interfaces and capturing background data as well as theses and other unpublished works. Where they have links with universities (such as Liverpool) they can then deposit articles onto the university repository.

This got me thinking about the duplication of articles, sitting on various repositories around the world, bringing with them the possibility of version problems and duplication of effort and cost. Are universities wasting their money? The cost of developing, building and maintaining an institutional repository is not small. According to Alma Swan’s recent report for the JISC, Modelling Scholarly Communication Options, the cost of running a repository ranges from £26k to £209k. Figures for development range from “free”(!) to several hundred thousand dollars.

However, cost seems to be no deterrent and the number of repositories is growing rapidly. As I write this, ROAR (the Registry of Open Access Repositories) lists 1925 repositories, of which the vast majority are operated by single institutions or departments. The materials that they hold duplicate to some extent the holdings of larger discipline-based sites such as RePEc, and PMC.

I can see various problems with this fragmented model quite apart from the duplication issues; for instance, types of files accepted for deposit (not fully XML-encoded) and lack of interoperability (ROAR cannot obtain data from a large number that they index). In the longer term it is likely that although technologies will (hopefully) allow for more integration, on-going maintenance and development of each repository will lead to greater diversification – providing solutions for individual institutions but potential complications for the interoperability of such systems. There may also be funding problems when a small institution suffers budget cuts that leads to reduced collection and upload activity.

At present there is a debate taking place about the correct model for ArXiv. Last year announced that it currently cost $400k p.a. to run, and was expecting this to rise to $500k within the next two years. Along with this announcement they asked for participating institutions to help share the cost. There are questions about whether ArXiv should be expected to capture everything (and allow institutions to harvest their content) or whether it is more sustainable for individual institutions to capture their own content and allow ArXiv to harvest it. ArXiv is very cost-effective; however the cost needs to be shared in some way if it is to be sustainable.

And what is the role for discipline-based repositories such as UKPMC? If a repository is to serve information-seekers then these are far more useful than (smaller) individual cross-discipline repositories. However they suffer from a lack of access to the authors of non-mandated materials such as theses where institutional libraries are better placed to advocate and coerce deposit.

In these straightened times, a collaborative model may be a better way to minimise costs. It seems to me that local capture onto a centralised repository which can then provide institution-based or discipline-based “windows” to its content is surely the way forward, with the actual content residing in one place, correctly captured, tagged and future-proofed. Such models do exist (e.g. CLACSO in Latin America, and HAL in France), but they appear to be the exception rather than the rule. Nevertheless, this approach should benefit from cost-savings and avoid duplication of effort whilst ensuring a high level of technical development.

Pippa Smart

Pippa Smart is a publishing consultant working with editorial groups and publishers.


