Access Issues: Breakout Group Report

Draft 6/22/09

Jeff Altschul (Archaeology), Ted Bestor (Sociocultural Anthropology), Jeff Good (PI; Linguistics), Keith Kintigh (Chair; Archaeology), Matthew Tocheri (Physical Anthropology), Peter Wittenburg (Linguistics)

Toward an Integrated Plan for Digital Preservation and Access to Primary Anthropological Data (AnthroDataDPA: A Four-Field Workshop)

PIs: Carol R. Ember, Eric Delson, Jeff Good and Dean Snow

May 18-20, 2009

The Access Issues breakout group addressed a variety of questions concerning access to digital anthropological data contained in formal disciplinary repositories.

Repository scope. In considering these questions the group made several observations concerning the nature and scope of these repositories. It was recognized first that formal repositories are needed and that investigator- or project-oriented data-silos are not and will not be financially or technically sustainable, nor will they likely provide the sorts of access—and access control—that are needed. However, it was the group’s contention that a unified repository structure for all anthropology is unlikely to be the best solution. The scope of anthropological repositories should be based on shared needs for functionality and the nature of the data at issue. The fields of anthropology are sufficiently divergent in terms of research goals and the data used to address research questions that trying to unite them now is neither realistic nor necessarily desirable. Yet, as more focused repositories develop, it would be well for there to be communication and agreement on some metadata standards and some tools that can be shared across repositories. Further anthropological repositories need not and should not restrict itself to “primary” data. The decision as to what should be archived will, of necessity, change over time and be driven to a large extent by a cost/benefit analysis undertaken by individual analysts in relation to guidelines set by the various subfields and funding agencies.

To what groups do we have responsibilities to provide access? The question of responsibility is to an extent intertwined with how the work was funded and what sorts of individuals might realistically desire access. We see the answer as a sort of priority list, in which we should attend most carefully to delivering access to the groups most interested and most likely to use it, namely anthropologists and other members of the scholarly community. In many cases we have strong ethical obligations to provide access to our informants and members of subject communities of our research. To the extent the data are generated with public money, we have clear responsibilities to provide access to the general public, unless otherwise restricted by legal or ethical considerations.

Who are and who might be the consumers of anthropological data? While there are important exceptions, in general we see no reason to restrict access to anthropological data. The group does not believe that is possible in practice or advisable in principle to use access control to restrict access to prevent uses that we may not like (e.g., by creationists or racists). There are a great variety of possible audiences, with the top three most highly prioritized:

  • Professional Anthropologists/Graduate Students
  • Other Scholars
  • Informants or Subjects and Subject Communities
  • Government agencies
  • Journalists
  • Advocacy groups
  • General Adult Public
  • College Students
  • K-12 Students
  • Commercial Interests
  • Unanticipated Users in Future Generations

Time frame for the development of an information infrastructure. Data are rapidly degrading in quality and being lost on a continuing basis. Much has already been lost irretrievably. We badly need functional repositories as soon as possible. These repositories need to be open to a broad range of depositors and backed up by institutional (including funding agency, university, professional association) commitments. It appears that sociocultural anthropology is the farthest behind in this regard.

Time frame for data ingest and public access. Data should be deposited in a HYPERLINK “” trusted repository during or as soon after data collection as possible in order that the needed metadata can be accurately and inexpensively collected and that a secure copy of the data is maintained. However the repository should provide the ability for the investigator to have exclusive access to the data (or for the investigator to directly control access to others) for a reasonable period of time to permit publication. What is a reasonable time for investigator control may differ by subdiscipline depending upon the dominant publication modes. Enforced mandates from funding agencies and better guidance from professional societies would be most helpful in defining appropriate limits. With public funding, the group felt that 3-5 years after the termination of the grant collecting the data was a reasonable limit, with 5 years for dissertations. In any case, 10 years seemed like an absolute maximum to restrict access to protect the investigator’s publication interests.

Rapid deposit is highly desirable because the ability to obtain these data metadata and the likelihood of data loss increase rapidly as time passes. Rapid deposit may also be advantageous to the investigator as it encourages organization of the data and facilitates sharing with collaborators.

Requirements for deposit according to established guidelines should be implemented as soon as functional repositories are available. In many cases it seems to be reasonable to mandate that, at the time of publication, supporting data should accessible in a trusted public repository. Use of these repositories should be enforced through peer review of both publications and grants.

Granularity of metadata. It is in the nature of many kinds of anthropological research that data are collected a multiple levels (e.g., individual and community, site and artifact, linguisitic corpus and session). Metadata are likely to be similarly complex and metadata requirements will vary across subfields and may be multilevel. For example, in archaeology it has proved efficient to collect metadata that applies to an entire project and separately to collect more refined metadata that refer to specific datasets that are part of that project.

Problems in making data public. It is the group’s position that adequately documented data should become public unless there are compelling reasons it should not be. However, particularly in sociocultural and medical anthropology confidentiality responsibilities will need to be rigorously observed. In some cases, access will be determined by clear-cut consent agreements or IRB stipulations. In other cases the investigator may perceive unstated “sensitivity” by descendent communities that might in some cases contrast with expressed desires of the subject communities. There are a number of very difficult issues here and we see no easy answers. Beyond explicit agreements, should the investigator alone be able decide on sensitivity? What happens after the investigator is gone? Should professional societies be involved in gate-keeping by the repositories to provide a viewpoint with more distance?

How do we lower the barriers to entry to the repositories. There are a number of impediments to the effective adoption and use of digital repositories. The main ones are cost/time impediments and the technology-related impediments. These will affect the scope of the data that is deposited for a given project or endeavor. Investigators are sure to contemplate the tradeoffs between the costs in time and money of depositing a given set of data and the benefits to the investigator and to the field more broadly. We believe that these tradeoffs are likely to be evaluated differently by subdiscipline. It may be, for example, that the costs of digitizing and depositing extensive sociocultural anthropology field notes will be relatively high compared to the perceived benefit.

To the extent that these tradeoffs are actively evaluated we need to change reward structures (e.g., though grant or publication incentives or requirements) to encourage deposit for data. More broadly we need to change disciplinary norms about what constitutes responsible professional behavior with respect to depositing different classes of data. Professional societies can play an active role in this regard. Archaeology is most advanced, with ethical standards that clearly require access to data. Other ways of encouraging deposit will be to require attributions of credit—or better, formal citation—of deposited data and professional valuation of these citations as we value ordinary publication citations.

Diminishing the disincentives to deposit would be accomplished by maximizing ease of use and by low cost. However, even with software tailored to streamline use, there will be a necessary tradeoff between the time investment required and the quality of the metadata and data obtained. Finally, prominent and compelling examples will be invaluable in demonstrating the scholarly value of deposit.

In this context, it is important to distinguish between “new” and “legacy” data. For projects that are just starting, digital archiving is a much simpler problem. The costs of archiving can be built into the project as well as the procedures, metadata standards, and the identification of the ultimate repository. Projects that are complete or that are on-going present a very different set of problems. The data were not collected with digital archiving in mind and often the investigators are dead or incapable of placing the data in acceptable formats or creating the needed metadata to make them useable. Even in cases in which the investigator is willing to invest the time and energy, there is great difficulty obtaining financial support. The two situations are qualitatively different and require very different solutions. Solving the archiving issues for new projects is simpler and easier and should proceed first. Professional societies and funding agencies should set guidelines for new projects and begin to enforce them at the same time they tackle the much more difficult issues involved with legacy data.

What does it take for a user to get access? There was a strong consensus that the repositories must have secure platforms with strong safeguards to prevent access to sensitive materials by individuals who should not be authorized for access. This demands not only a login but also ways of reliably authenticating user credentials. It was generally but not universally accepted in the full group that a login should be required even for access to material that is not in some way restricted. User agreements, informed by professional ethics, will need to be established by the repositories.

