Skip to content

admin DATA MANAGEMENT

Sharon Grant edited this page Aug 20, 2024 · 25 revisions

Last Updated: July 2022

Purpose of Document

NSF requires that projects which develop digital products as part of the outputs fill out a sustainability questionnaire as part of the application process. This document may be used to assist application to any grant awarding body that requires a data management plane.

Please contact the Technology Department ([email protected]) for further information, clarification or assistance.

Example form

IPR

In general the Field Museum of Natural History (FMNH) claims copyright to its collection data and images. However, collections metadata and images are released under creative commons licensing. See the Museum's Norms and Considerations for more detailed information.

Museum-wide collections management system

The Field Museum uses the Axiell (formerly KE Software) EMu platform as its official collections management system (What is EMu?). It has a good track record of building, sustaining and supporting systems and supports the migration of primary collections data into EMu, by providing capital support for software upgrades and site licenses. Over the past 15 years, with the assistance of NSF funding, internal capital projects and Museum operations, three of the four science departments are fully integrated in the museum's collection management system Axiell’s (formerly KE Software) Electronic Museum software (EMu). The Botany department was the first department to be fully integrated in 2005 and they have been serving data to the community through their web interfaces since 2007. Following this the Anthropology department was integrated into EMu in 2010. Our Zoology department is the largest scientific department at the museum and was fully migrated to EMu in 2011. Palaeobotanical and Fossil Invertebrate collections completed migration in 2013 (Meteoritics & Minerals and Vertebrate Paleontology divisions are in progress). Exhibitions Registration fully integrated into the system in 2014. In 2016 Exhibitions Development and Productions transferred from their old FileMaker system. The Museum’s Action Center began using EMu to hold its observations and rapid reference guide datasets in 2017.

EMu is highly modular and has a hybrid object-relational data model enabling users to focus on the data, and not data design, it provides high-speed searches and the ability to do complex queries across data boundaries (e.g., different objects pertaining to different departments collected from the same region). At the core of EMu is the central catalogue, which contains all information about objects in the care of the institution. This central catalogue is supported and enhanced by complex accessory information held in different modules, including taxonomy, sites, parties, bibliography, multimedia, transactions and a stratigraphy among many others.

Primary collections data are stored on a dedicated server, whilst vetted, non-sensitive data is replicated on a separate, outward facing server for delivery to the FMNH website and separately to its Integrated Publishing Toolkit (IPT) instance (https://fmipt.fieldmuseum.org/) which serves other external sites including GBIF (http://www.gbif.org/), VertNet (http://vertnet.org/index.php).), iDigBIO (https://www.idigbio.org/) and others)

We host our publically available collections on the web and we’ve detailed the history and decision-making process around our collections’ web presence.

A formal network of Field Museum collections managers, registrars, exhibition developers and the Technology Department are responsible for the integrity and safety of data held within EMu. See the service level agreement and data responsibility structure.

The Museum has 55 concurrent licenses. forty-eight assigned to the live environment and seven to the web environment.

For further background in the implementation of EMu at the Field Museum see “Collections Managements Systems at the FMNH: 2000 -12”.

The Field Museum's implementation of EMu is documented on the EMu Documentation github repo.

Digital Asset Management System (DAMS)

In 2022 the Field Museum implemented NetX as its Museumwide Digital Asset Management System. This system replaces EMu as the front-end for all Museum staff and authorised users to access the Field Museum's digital assets.

History: In 2014 with the assistance of the Grainger Foundation, the Field Museum made the decision to utilize EMu’s multimedia asset handling functionality to function as the institutional digital asset management system (DAMS) - known as DAMu. In August 2014 asset archiving workflows and new EMu functionality were deployed museum-wide. This included the move to DNG as the standard FM archival format for images. See below for hardware implemented to support DAMu.

Dissemination / Access

Primary public dissemination and access to collections data is via the museums departmental collections (eg http://collections-zoology.fieldmuseum.org/) and project websites. The Museum launched its main site in Drupal 7 in March 2011 and it underwent major updates in 2014, 2016 and 2018. A major feature of this was the development of an open source Drupal/Solr connection to the museum’s collections management system EMu (Axiell). This open source software is now available publicly at: https://github.com/palantirnet/kiwi http://drupal.org/project/sarnia

Transfer of existing collections sites to this platform is on-going and will make it possible for changes, additions, or modifications to data to be rapidly reflected to the web user. To facilitate this a dedicated collections web data server is in place running Solr indexing specifically for collections data. This gives increased speed for online searches; relieves pressure on the live server; provides security for sensitive data and allows full integration of collections data across FM sites.

DatoCMS

The current version of the fieldmuseum.org website launched on March 15, 2023. We moved from the Drupal 7 platform, hosted with Pantheon to Next.js/DatoCMS, hosted on Vercel. A big part of that process was exporting and scraping all of the page data from Drupal 7 and importing it into DatoCMS. Kate Webbink was primarily responsible for this data exfiltration process and the majority of the code used for the ETL (extract-transform-load) process is located in the parascraper repository. DatoCMS is currently our CMS of choice for housing web-ready, web-related data for use with any of our websites, or apps.

IPT

Data is also served to the GBIF data portal (http://data.gbif.org/datasets/provider/49) from a dedicated IPT (http://fmipt.fieldmuseum.org:8080/ipt/) service which is currently providing over 2 million records to the public and peers. Field Museum data from these publicly available sources are currently in use by VertNet (http://vertnet.org/index.php), EOL (http://eol.org/);), The Lichen Consortium (http://lichenportal.org/portal/);), iDigBIO https://www.idigbio.org/ amongst others, SpeciesLink (http://www.splink.org.br/), MorphoSource (https://www.morphosource.org/) (The FM DiGIR portal was de-commissioned in 2015).

The Field Museum is committed to the integration of its primary collections and research systems data. As part of the Museum’s 2011-12 data strategy our collections management software, server and cloud based storage solutions and web technologies are being integrated to provide a holistic approach to protecting and providing access to these important datasets.

Current Field Museum collections data search properties:

Whitebox Sites

  • mm.fieldmuseum.org
  • db.fieldmuseum.org
  • pj.fieldmuseum.org

Collections Search Sites

  • collections-zoology.fieldmuseum.org
  • collections-anthropology.fieldmuseum.org
  • collections-botany.fieldmuseum.org
  • collections-geology.fieldmuseum.org

See https://emudata.fieldmuseum.org/ for a full list of emu driven FM web properties.

Data standards and interoperability

The Anthropological, Botanical, Zoological and the majority of the Geological Collections Areas and the Exhibitions Department are integrated into EMu and manage the movement and history of The Museum’s objects, artifacts and specimens in its shared database. The Museum currently exports Darwin Core and Audiovisual Core datasets via its IPT server (http://fmipt.fieldmuseum.org:8080/ipt/) from EMu to GBIF, VertNet, iDigBio, Lichen and Bryophyte portals and Symbiota as Darwin Core Archive files.

In 2018, the Technology Department in close collaboration with Science and Education, began the process of overhauling and streamlining its collections management system. Cross-functional teams were assembled and each module reviewed. Development is currently ongoing. In December 2018 EMu stats: 34,461 = Total # new records in EMu in September, of which: 6,494 = new catalogue records 14,050 = new multimedia records ...in which, there are 150GB of new media files

Fun Fact (as an example of why we are working on standardising): There were >105,000 duplicate Multimedia records in EMu. These were consolidated to ~43,000 records.

Sustainable support of infrastructure

Storage for collections/museum digital assets is maintained by the Museum’s Technology Department. Currently, it stores around >80TB of data on a hybrid network attached storage (NAS) and archive solution hosted on site. The NAS also utilizes an encrypted “cloud” backend for infinite scalability while minimizing on-premise hardware costs. All drives in the storage arrays are configured for high-availability, hot-swap failover and redundancy. Additionally the solutions are also covered under a 24X7X365, 4 hour response time, maintenance agreement. There are provisions for expansion of the NAS annually.

The Museum has deployed server virtualization technology over the past few years and currently has 90% of all museum servers virtualized on a 5 node hyper-converged virtual machine cluster. Additionally, the Museum is leveraging solutions allowing our datacenter to (virtually) extend into the cloud for even higher levels of uptime. In instances where virtualizing a server isn’t the best option, physical servers with built in disk and power redundancy are deployed to insure high-availability and to minimize downtime. All production servers are covered under a 4hr response time maintenance agreement.

The Museum currently operates using a gigabit fiber connection for internet connectivity with a redundant microwave 1000 Mbps connection as well as a tertiary cellular backup. These provide service for both the wired LAN and Wi-Fi networks. With these connections, matched with the Museum’s fully redundant edge network, an extreme level of uptime and reliability can be achieved.

Long-term archiving / Data storage / Maintenance

Collections data and associated multimedia created as part of grant funded or capital projects are considered priority core FMNH digital assets and as such are backed up using the following schedule:

Daily snapshots from 6pm to 6am with incremental updates every 4 hrs, with indefinite offsite retention.

Grant funded storage

Archival network storage purchased for the storage of data and media generated by a grant funded project is backup up following the same schedule as outlined above.

Sustainability of project outcomes and digital content

Project specific data - Post the lifetime of any grant, and at such time as no dedicated project personnel are available to maintain a project website that serves FM data, any data and assets created by the project that have been authorised by the responsible CATs will (if not already) be made available via the Museum’s departmental collections search sites.

Project specific websites - Post the lifetime of any grant and without prior documented arrangements with the IT department, project specific websites will not be supported over and above regular updates to web servers and software for security.

Strategic Vision

Collections are at the core of the Museum’s mission and vision. Images and metadata long thought of as surrogates for physical specimens and objects are now considered not only to be vital records for disaster recovery but increasingly have intrinsic value in their own right and are invaluable collections assets. As such the Technology Department sees the long term preservation of these as a critical element of its mission.

EMu hardware

Shackleton: Live DB - Purchased 2019 Ross: Live Web - Purchased 2014

Server Specifications

Timeline/Sequence

Clone this wiki locally