Metadata Issues
Metadata Break-out Group:
Jeanne Altmann, Eric Delson, Eric Kansa, Robert Kemper , Tom Moritz (Chair), Joel Sherzer
“Data” and “Metadata”
“…’data’ are defined as any information that can be stored in digital form and accessed electronically, including, but not limited to, numeric data, text, publications, sensor streams, video, audio, algorithms, software, models and simulations, images, etc.” — Program Solicitation 07-601 “Sustainable Digital Data Preservation and Access Network Partners (DataNet)”
Taken in this broadest possible sense, “data” are thus simply electronic coded forms of information. And virtually anything can be represented as “data” so long as it is electronically machine-readable.
Our group agreed upon a more pragmatic definition of “data” as measurements, observations or descriptions of a referent — such as an individual, an event, a specimen in a collection or an excavated/surveyed object — created or collected through human interpretation (whether directly “by hand” or through the use of technologies).
Metadata are descriptive documentation essential to informing the process of data creation, collection, management and preservation. (This process is now commonly referred to as “data curation”.) Metadata provide information about the original referent, the collection processes, rules of collection, as well as descriptions of data management processes and provisions for access and use of the data (such as licensing of data to specify permitted uses).
Metadata provide key contextual information to facilitate understanding and are intended to assist research within known and predictable scientific domain(s). However, in the Web environment, metadata may also enable discovery and use in as yet unanticipated fields of research; hence, careful efforts should be made to make the descriptive content of metadata intelligible to scientists beyond a very limited scientific expertise.
From a pragmatic perspective, it was agreed that metadata creation is an ongoing process not a single event, and that metadata usefully may grow over time by accretion, asynchronously, by the efforts of properly qualified contributors. The question of appropriate control over who may contribute to the ongoing development of metadata should be addressed.
Metadata Accessibility, Costs, Commonalities
It was also recognized that metadata creation involves serious investment and that care must be taken to insure optimal and parsimonious approaches. The notion of minimally adequate “fitness for use” is one useful test of a metadata scheme. Our group agreed that for purposes of “discovery” (identification and location) of data – across the four anthropological fields represented in the workshop – time, place and manner/mode of collection may be minimally adequate. Beyond “discovery” — for more in-depth research and education purposes — metadata must provide richer descriptive content and detailed contextualization. But in that each metadata element is essentially a cost vector, great care should be taken to balance cost and benefit in identifying case-specific minimum adequacy. It was noted that by careful use of normalization, inference and recursion significant efficiencies can be achieved in the design and implementation of metadata schema.
Dublin Core Metadata
The group discussed the possible application of the Dublin Core Element Set:
From Guide to Best Practice: Dublin Core (DC 1.0 = RFC 2413) Final Version 12 August 1999 The 15 Dublin Core Elements
|
Although the metadata scheme is in wide use – and particularly in the OAI-PMH protocols — it was recognized that some Dublin Core elements may be poorly suited for anthropological applications. For instance, how do we describe “local contributors”; as “author / creator”, as “other contributors”, or “source”? In some contexts a local community member may consider themselves to be a “steward” or “keeper” of knowledge, or as an advocate for the community. We thus believe that before adoption for widespread use in anthropology, broad metadata standards such as Dublin Core, must be closely scrutinized and modified (“qualified”) to meet domain requirements.
Ethical Dimensions: Professional Community and Beyond
Data curation is best informed by the researcher or researchers primarily responsible for the collection/creation of the data. (Michener notes: “Comprehensive metadata counteract the natural tendency for data to degrade in information content through time.” (Michener, Ecological Informatics1 (2006) 4. ) )
Our group believes that timely generation of appropriate metadata is a professional and ethical obligation and; in certain contexts, descendant and or local communities should be involved in the process of metadata creation. This was seen to require normative change among individuals, disciplines, organizations/institutions and governments. It follows that funders, both private and public sector, must recognize metadata — and data curation more generally — as essential and legitimate expenses that must be adequately supported.
The group discussed various incentives and disincentives (“carrots and sticks”) pertaining to metadata creation. It was recognized that in NSF itself there are variations from program to program concerning data curation and that actual enforcement of requirements for data curation can be highly variable.
Return to Chair Reports