🕒 Page Health

It has been 696 days since this page was last updated.

← Back to Schema Home

# lifecycle

# Schema

# What we mean by 'Lifecycle'

The schema designation lifecycle refers to the various stages data pass through, from their conception with a data collector's creation of a dataset to a researcher sharing the results of an analysis in a publication. Data are not required to move through every stage of the lifecycle, but a typical progression can include collecting, structuring, value entering, formatting or "cleaning", safe-keeping (backing up or archiving), analyzing, visualizing, making arguments based upon, and finally sharing. For any given dataset, it is common that different folks are involved at each stage in this sometimes-linear, sometimes-iterative lifecycle.

The people involved at each stage will necessarily make decisions that end up impacting future users. For instance, a data collector might wonder, "Should I let my survey participants choose from a set of pre-defined gender options, which will make analysis simpler, or should I leave that field open-ended to allow for more nuance?" Likewise, a data analyst might ask, "Should I delete columns irrelevant to my own research to improve readability, or might these fields someday be useful to someone else?"

In making choices, sometimes humans are deliberate and thoughtful. Other times, and this can be especially true when technology is involved, people default to expediency, making quick judgements as they experiment with tools and techniques, until they hit upon workable solutions to the task at hand. It is common, especially in this latter scenario, for people forget to document their workflow, preserving instead only the final results.

Much is at stake when humans do not document their decision making process when it comes to changes they have made to data. It makes it harder for someone else using the data to be able to understand how it came to be, which can impact these future users' own decision-making abilities about how to approach the data.

The goal of this section of the schema is to encourage, to the degree possible, clear and thorough documentation of human decisions throughout the course of the data lifecycle, ultimately striving to present context about data to users in a way that makes the 'data genealogy' or progression through the lifecycle as clear and understandable as possible.


Because libraries that archive data deal largely with changes made to data after the original collection process, we begin our lifecycle with acquisition. You can find notes, where available, on collection methodology in the considerations section of the schema.

# Example use

	"lifecycle": {
		"acquisition": {
			"ingredients": [{
				"$id": "ark:/76611/dkgskarjx",
				"notes": "The Leventhal Map & Education Center changed this ingredient dataset to create the new simplified towns dataset"
		"description": {
			"contextProvider": {
				"name": "Belle Lipton",
				"relationshipToData": "Archivist"
			"contextOnBehalfOf": {
				"name": "Garrett Dash Nelson",
				"relationshipToData": "Processor/Cleaner"
			"contextPublicationDate": "2020-10-02",
			"metadataSchema": {
				"schemaName": "LMEC Data Description Schema",
				"$id": "https://github.com/nblmc/Data-Context/blob/master/schema.json"
		"maintenance": {
			"officialMaintainer": "Leventhal Map & Education Center",
			"maintenanceFrequency": "This dataset is not frequently updated"
		"processing": {
			"choices": [{
				"title": "Step-by-Step Process",
				"author": "Garrett Dash Nelson",
				"format": "Jupyter Notebook",
				"relatedResourceURL": "https://github.com/nblmc/massachusetts-municipal-boundaries/blob/main/processor.ipynb"
Last Updated: 3/9/2021, 12:32:02 PM