Resource Lifecycle¶
5 Stages of the Resource Lifecycle
flowchart LR
I((1.<br> IDENTIFY)) --> H[/2. <br> HARVEST/] --> P[3. <br> EDIT] --> X[4. <br>INDEX] --> M{{5. <br>MAINTAIN}}--> H[/2. <br>HARVEST/]
1. Identify¶
BTAA-GIN Team Members and Product Manager
Team members seek out new content for the geoportal. See the page How to Submit Resources to the BTAA Geoportal for more information.
2. Harvest¶
Graduate Research Assistants and Product Manager
This stage involves obtaining the metadata for resources. At a minimum, this will include a title and and access link. However, it will ideally also include descriptions, dates, authors, rights, keywords, and more.
Here are the most common ways that we obtain the metadata:
- a BTAA-GIN Team Member sends us the metadata values as individual documents or as a combined spreadsheet
- we are provided with (or are able to find) an API that will automatically generate the metadata in a structured file, such as JSON or XML
- we develop a customized script to scrape directly from the HTML on a source's website
- we manually copy and paste the metadata into a spreadsheet
- a combination of one or more of the above
This step also involves using a crosswalk to convert the metadata into the schema needed for the Geoportal. Our goal is to end up with a spreadsheet containing columns matching our metadata template.
Why do we rely on CSV?
CSV (Comma Separated Values) files organize tabular data in plain text format, where each row of data is separated by a line break, and each column of data is separated by a delimiter.
We have found this tabular format to be the most human-readable way to batch create, edit, and troubleshoot metadata records. We can visually scan large numbers of records at once and normalize the values in ways that would be difficult with native nested formats, like JSON or XML. Therefore, many of our workflow processes involve transforming things to and from CSV.
3. Edit¶
Graduate Research Assistants and Product Manager
When working with metadata, it is common to come across missing or corrupted values, which require troubleshooting and manual editing in our spreadsheets. Refer to the Collections Project Board for examples of this work.
After compiling the metadata, we run a validation and cleaning script to ensure the records conform to the required elements of our schema. Finally, we upload the completed spreadsheet to GBL Admin, which serves as the administrative interface for the Geoportal. If GBL Admin detects any formatting errors, it will issue a warning and may reject the upload.
4. Index¶
Product Manager
Once the metadata is successfully uploaded to GBL Admin, we can publish the records to the Geoportal. The technology that actually stores the records and enables searching is called Solr. The action of adding records is known as "Indexing."
Periodically, we need to remove records from the Geoportal. To do this, we use GBL Admin to either delete them or change their status to "unpublished."
5. Maintain¶
BTAA-GIN Team Members, Graduate Research Assistants, and Product Manager
The Geoportal is programmatically checked for broken links on a monthly basis. The are fixed either by manually repairing them or by reharvesting from the source.
Sequence diagram of Resource Lifecycle¶
sequenceDiagram
actor Team Member
actor Product Manager
participant GitHub
actor Research Assistant
participant GBL Admin
participant Geoportal
Note right of Team Member: IDENTIFY
Team Member->>Product Manager: Submit Resources
Product Manager->>GitHub: Create GitHub issue
GitHub ->>Research Assistant: Assign issue
Note left of Research Assistant: HARVEST
Note left of Research Assistant: EDIT
Research Assistant->>GBL Admin: Upload records
Research Assistant ->>GitHub: Update GitHub issue
Note right of GBL Admin: PUBLISH
Product Manager->>GBL Admin: Publish records
GBL Admin->>Geoportal: Send records online
Product Manager->>GitHub: Close GitHub issue
Product Manager ->> Team Member: Share link to published records
Note left of Research Assistant: MAINTAIN