Resource Lifecycle¶

5 Stages of the Resource Lifecycle

flowchart LR

  I((1.<br> IDENTIFY)) --> H[/2. <br> HARVEST/] --> P[3. <br> EDIT] --> X[4. <br>INDEX] --> M{{5. <br>MAINTAIN}}--> H[/2. <br>HARVEST/]

1. Identify¶

BTAA-GIN Team Members and Product Manager

Team members seek out new content for the geoportal. See the page How to Submit Resources to the BTAA Geoportal for more information.

2. Harvest¶

Graduate Research Assistants and Product Manager

This stage involves obtaining the metadata for resources. At a minimum, this will include a title and and access link. However, it will ideally also include descriptions, dates, authors, rights, keywords, and more.

Here are the most common ways that we obtain the metadata:

a BTAA-GIN Team Member sends us the metadata values as individual documents or as a combined spreadsheet
we are provided with (or are able to find) an API that will automatically generate the metadata in a structured file, such as JSON or XML
we develop a customized script to scrape directly from the HTML on a source's website
we manually copy and paste the metadata into a spreadsheet
a combination of one or more of the above

This step also involves using a crosswalk to convert the metadata into the schema needed for the Geoportal. Our goal is to end up with a spreadsheet containing columns matching our metadata template.

Why do we rely on CSV?

CSV (Comma Separated Values) files organize tabular data in plain text format, where each row of data is separated by a line break, and each column of data is separated by a delimiter.

We have found this tabular format to be the most human-readable way to batch create, edit, and troubleshoot metadata records. We can visually scan large numbers of records at once and normalize the values in ways that would be difficult with native nested formats, like JSON or XML. Therefore, many of our workflow processes involve transforming things to and from CSV.

3. Edit¶

Graduate Research Assistants and Product Manager

When working with metadata, it is common to come across missing or corrupted values, which require troubleshooting and manual editing in our spreadsheets. Refer to the Collections Project Board for examples of this work.

After compiling the metadata, we run a validation and cleaning script to ensure the records conform to the required elements of our schema. Finally, we upload the completed spreadsheet to GBL Admin, which serves as the administrative interface for the Geoportal. If GBL Admin detects any formatting errors, it will issue a warning and may reject the upload.

4. Index¶

Product Manager

Once the metadata is successfully uploaded to GBL Admin, we can publish the records to the Geoportal. The technology that actually stores the records and enables searching is called Solr. The action of adding records is known as "Indexing."

Periodically, we need to remove records from the Geoportal. To do this, we use GBL Admin to either delete them or change their status to "unpublished."

5. Maintain¶

BTAA-GIN Team Members, Graduate Research Assistants, and Product Manager

The Geoportal is programmatically checked for broken links on a monthly basis. The are fixed either by manually repairing them or by reharvesting from the source.

Sequence diagram of Resource Lifecycle¶





    sequenceDiagram
        actor Team Member
            actor Product Manager
            participant GitHub
            actor Research Assistant
            participant GBL Admin
            participant Geoportal   


            Note right of Team Member:  IDENTIFY

            Team Member->>Product Manager: Submit Resources
            Product Manager->>GitHub: Create GitHub issue
            GitHub ->>Research Assistant: Assign issue
            Note left of Research Assistant:  HARVEST
            Note left of Research Assistant:  EDIT 

            Research Assistant->>GBL Admin: Upload records
            Research Assistant ->>GitHub: Update GitHub issue
            Note right of GBL Admin:  PUBLISH 

            Product Manager->>GBL Admin: Publish records
            GBL Admin->>Geoportal: Send records online 
            Product Manager->>GitHub: Close GitHub issue
            Product Manager ->> Team Member: Share link to published records

            Note left of Research Assistant:  MAINTAIN