Data Preservation Issues

Data Preservation Issues:

Summary of Breakout Discussion Group

May 19, 2009

Chair, Carol R. Ember (PI) and Anthony Aristar, Jeffrey Clark, Lisa Conathan, Robert Leopold, Daniel Reboussin and David Glenn Smith

Importance of Digital Preservation

The breakout group stressed the importance of preserving all anthropological research and related materials. The importance of such preservation is obvious as it provides the context for understanding the research undertaken, whether it be qualitative or quantitative research. The appropriate analog should be “lab notebooks” in the physical sciences which are deemed critical for evaluating published research. But historians recognize that other information about the observer is also important and certainly critical for evaluating any biases. So, preservation of any associated materials (dairies, correspondence, etc.) is also of intellectual value.

Why digital preservation?

  1. Physical archives have only stored a very small portion of the anthropological corpus. For example, Robert Leopold of the National Anthropological Archives estimated that 500 anthropologists retire each year, but the NAA only acquires 6-8 major collections each year (Schmidt, 2008). And universities, with limited funding, always make choices about which collections they will take and process. The group speculated on why potential donors have been reluctant to give their materials to archives to date (see below). These reasons are important because they suggest why digital preservation may play an important role in future preservation efforts.
  2. Much of the anthropological data accumulated is now “born-digital” and physical repositories will find it difficult to preserve this material in a form that will be accessible in the future.
  3. Digital preservation can lead to more open access (see report from Access group)

Why have anthropologists been reluctant to give their data to physical archives?

We do not have research that bears on this question. However, we felt that answers to this question would have important implications for understanding the need for AnthroData DPA.

Some reasons that were suggested:

Some anthropologists think they will give up their ability to work on their data if they deposit it in an archive, however, they are not ready to stop working. Having a digital copy or access to it online will help enormously. It should increase physical preservation.

Some anthropologists do not think that archives provide enough access for their work—they would prefer digital access of some kind. While theoretically almost any scholar can go to an archive—it is expensive and time-consuming to go to an archive to do research.

Some scholars think they have to be famous for an institution to consider their collection. This is apparently not the case for the National Anthropological Archives. The director, Robert Leopold says that any collection of an anthropologist will be taken. However, perceptions have reality—if scholars believe this myth, they may not ask an archive to take their material.

Some simply do not want to face their mortality and do not think about the matter until it is too late.

What are primary data?

The original question posed to this breakout group was how to define primary data. However, the group decided that the distinction between primary data and secondary data is an unneccessary distinction. Moreover, different fields have very different kinds of data. We decided to settle on the more neutral phrase—anthropological research materials. These are what are important to preserve.

What kinds of data need to be preserved?

  1. What about multiple formats? There was some debate on whether different forms of data (e.g., handwritten and typed) on the same subjects need to be preserved. The archivists in the group stressed that it is not easy to know in advance how information might be useful in the future, and it is not always clear that two forms are identical, so it is preferable to preserve all forms that are available. Others cited examples of such a practice being a waste of resources, such as preserving a fuzzy and a clear picture of the same subject. However, it was also felt that it is probably more labor-intensive to sort through material to decide what is worth keeping and what is not, so keeping all related materials is probably the best strategy.
  2. What about “gray” literature? There was consensus that “gray” literature (a term widely used for research reports in archaeology produced for contract work) should be digitally preserved. Such literature contains important information, sometimes the only information available on certain sites that will be destroyed or severely impacted.
  3. What if it is digital but in less-than-desirable formats? There was consensus that if the less-than-desireable formats are all that there is, they should be preserved.

In general, the consensus of the group was that the aim should be to preserve all anthropological research materials.

Can digital object repositories act as long-term preservation?

This was the most controversial issue in the group. Some argued that physical preservation is always the safest long-term preservation strategy for paper. Digital preservation, on the other hand, with migration strategies, may be best for other material such as tapes and objects on computer disks that have shorter life-spans. Others felt that if done properly, digital object repositories can act as long-term preservation strategies and have the advantage of allowing multiple copies to be “housed” in different places (decreasing the risk of destruction from physical or social disasters/upheavals).

However, as mentioned in the history section (NOT YET POSTED), many digital projects do not have plans for long-term preservation in place. If there is any doubt about long-range preservation, both strategies should be pursued.

What efforts moving forward might facilitate future preservation?

Some of the suggestions for encouraging AnthroDataDPA are:

  1. Encourage granting agencies to require a preservation plan and provide funding for DPA as part of the research grant. We believe that this will go a long way to promoting DPA.
  2. Recommend that guidelines for preservation be made part of the anthropological code of ethics.
  3. Develop a donor input system that allows uploading data as research is conducted. Such a system, with appropriate fields/prompts to input necessary metadata will minimize the labor costs to put data into archivable form

Such data needs to be accessible only to the researcher at the ingest and other preliminary stages of the research project

Some fields of metadata can be required at ingest

The researcher is in a better position to enter some metadata compared with an archivist (such as time object was created, place, explanatory captions). There could also be fields for private information that only the researcher would see.

Researchers could add information such as their own classification system, keywords, etc.

Schmid, Oona. 2008. Inside the National Anthropological Archives: An Interview with Robert Leopold. Anthropology News, January: 32-33.

Return to Chair Reports

Leave a Reply