Data Preservation Issues

Summary of Breakout Discussion Group

May 19, 2009

Chair, Carol R. Ember (PI) and Anthony Aristar, Jeffrey Clark, Lisa Conathan, Robert Leopold, Daniel Reboussin and David Glenn Smith

Importance of Digital Preservation

The breakout group stressed the importance of preserving all anthropological research and related materials. The importance of such preservation is obvious as it provides the context for understanding the research undertaken, whether it be qualitative or quantitative research. The appropriate analog should be “lab notebooks” in the physical sciences which are deemed critical for evaluating published research. But historians recognize that other information about the observer is also important and certainly critical for evaluating any biases. So, preservation of any associated materials (dairies, correspondence, etc.) is also of intellectual value.

Why digital preservation?

  1. Physical archives have only stored a very small portion of the anthropological corpus. For example, Robert Leopold of the National Anthropological Archives estimated that 500 anthropologists retire each year, but the NAA only acquires 6-8 major collections each year (Schmidt, 2008). And universities, with limited funding, always make choices about which collections they will take and process. The group speculated on why potential donors have been reluctant to give their materials to archives to date (see below). These reasons are important because they suggest why digital preservation may play an important role in future preservation efforts.
  2. Much of the anthropological data accumulated is now “born-digital” and physical repositories will find it difficult to preserve this material in a form that will be accessible in the future.
  3. Digital preservation can lead to more open access (see report from Access group)

Why have anthropologists been reluctant to give their data to physical archives?

We do not have research that bears on this question. However, we felt that answers to this question would have important implications for understanding the need for AnthroData DPA.

Some reasons that were suggested:

Some anthropologists think they will give up their ability to work on their data if they deposit it in an archive, however, they are not ready to stop working. Having a digital copy or access to it online will help enormously. It should increase physical preservation.

Some anthropologists do not think that archives provide enough access for their work—they would prefer digital access of some kind. While theoretically almost any scholar can go to an archive—it is expensive and time-consuming to go to an archive to do research.

Some scholars think they have to be famous for an institution to consider their collection. This is apparently not the case for the National Anthropological Archives. The director, Robert Leopold says that any collection of an anthropologist will be taken. However, perceptions have reality—if scholars believe this myth, they may not ask an archive to take their material.

Some simply do not want to face their mortality and do not think about the matter until it is too late.

What are primary data?

The original question posed to this breakout group was how to define primary data. However, the group decided that the distinction between primary data and secondary data is an unneccessary distinction. Moreover, different fields have very different kinds of data. We decided to settle on the more neutral phrase—anthropological research materials. These are what are important to preserve.

What kinds of data need to be preserved?

  1. What about multiple formats? There was some debate on whether different forms of data (e.g., handwritten and typed) on the same subjects need to be preserved. The archivists in the group stressed that it is not easy to know in advance how information might be useful in the future, and it is not always clear that two forms are identical, so it is preferable to preserve all forms that are available. Others cited examples of such a practice being a waste of resources, such as preserving a fuzzy and a clear picture of the same subject. However, it was also felt that it is probably more labor-intensive to sort through material to decide what is worth keeping and what is not, so keeping all related materials is probably the best strategy.
  2. What about “gray” literature? There was consensus that “gray” literature (a term widely used for research reports in archaeology produced for contract work) should be digitally preserved. Such literature contains important information, sometimes the only information available on certain sites that will be destroyed or severely impacted.
  3. What if it is digital but in less-than-desirable formats? There was consensus that if the less-than-desireable formats are all that there is, they should be preserved.

In general, the consensus of the group was that the aim should be to preserve all anthropological research materials.

Can digital object repositories act as long-term preservation?

This was the most controversial issue in the group. Some argued that physical preservation is always the safest long-term preservation strategy for paper. Digital preservation, on the other hand, with migration strategies, may be best for other material such as tapes and objects on computer disks that have shorter life-spans. Others felt that if done properly, digital object repositories can act as long-term preservation strategies and have the advantage of allowing multiple copies to be “housed” in different places (decreasing the risk of destruction from physical or social disasters/upheavals).

However, as mentioned in the history section (NOT YET POSTED), many digital projects do not have plans for long-term preservation in place. If there is any doubt about long-range preservation, both strategies should be pursued.

What efforts moving forward might facilitate future preservation?

Some of the suggestions for encouraging AnthroDataDPA are:

  1. Encourage granting agencies to require a preservation plan and provide funding for DPA as part of the research grant. We believe that this will go a long way to promoting DPA.
  2. Recommend that guidelines for preservation be made part of the anthropological code of ethics.
  3. Develop a donor input system that allows uploading data as research is conducted. Such a system, with appropriate fields/prompts to input necessary metadata will minimize the labor costs to put data into archivable form

Such data needs to be accessible only to the researcher at the ingest and other preliminary stages of the research project

Some fields of metadata can be required at ingest

The researcher is in a better position to enter some metadata compared with an archivist (such as time object was created, place, explanatory captions). There could also be fields for private information that only the researcher would see.

Researchers could add information such as their own classification system, keywords, etc.

Schmid, Oona. 2008. Inside the National Anthropological Archives: An Interview with Robert Leopold. Anthropology News, January: 32-33.

  1. Joel Halpern says:

    I find some of these observations of anthropologists a bit outside my world view. One might think that anthropologists as organized researchers would keep careful records and then after their material is published (or when it seems obvious that some or all of their materials may never be published they would see to their deposit in an apporpiate archive.)

    My personal experience is that. as noted, at lest some anthropologists think that their original notes can then be thrown away. Then there are those who for one reason or another never make a disposition of their papers. This seems to be often the case especially for those who have had careers outside of academia. In the last few years I have personally known of several cases in which books, phonographic records (of ethnic music) photographs and field notes have all ended up in the dumpster. This would seem to parallel the trend where libraries with their emphasis on digital materials regularly dump large bound runs of scholarly journals. This is particularly true of those anthropologists who die without close kin or friends. There is usually a need to quickly clean out apartments or houses for financial reasons. Also there are cases where close surviving kin, before they die or become disabled, are very much aware of the archival type material of the deceased, but for various reasons never make a decision. Then the ultimate decision is left to less close kin or others and the quick disposal is also the solution. THese actions or inactions can be seen as alternate routes to the dumpster. In some cases books (especially those containing notes) can be of significant value. But unless they are unusual in some way their general value to dealers is about $3 per volume . This would, of course, include University Press publications. Libraries and their associated archives often have widely divergent policies.

    Some like Columbia University (from which I graduated) have a long established history of indifference to the archives of faculty or graduates. The classic case is, of course, the papers of Franz Boas at the American Philosophical Society in Philadelhia.

