Jornada LTER - Data Management Plan

Table of Contents

  1. Introduction
  2. Information Management System
    1. Data Management Implementation/Process
    2. Management of Data and Metadata
    3. Data Protocols and Metadata Standards
    4. Tools and Resources Dedicated to Harvest, Document, Archive, Manage, and Provide Data Access
      1. Geodatabase
      2. Website, Data Catalogs, and Geoportals
    5. Networking and Computing Services
      1. Network Services
      2. Computer Services
      3. Data Protection
    6. Information Management Team
  3. Milestones and Deliverables Relative to Current LTER Network Activities
  4. LTER Network and Community Activities

Figures

  1. Data Management Process
  2. Jornada Information Management System
  3. Milestones and Deliverables Timeline

  1. Introduction (top) The Jornada Information Management System (JIMS) provides the infrastructure for the curation, protection, access, and analysis of Jornada LTER (JRN) data holdings. Our mission is to protect and provide access to publically funded research data, tools, and findings that result from JRN research and associated collaborations. The purpose of our information management system is to provide protocols and services for data collection, verification, organization, archive, and distribution. One of the primary tools to insuring long-term usefulness of data and products is detailed metadata that describes the research project and its related datasets. Metadata are shared and leveraged amongst services and tools of the JIMS. We provide access to hundreds of data sets linked directly to our program or from other research locations and management agencies in support of multi-user needs. Our intent is to provide data sets, science-based information, tools, and technologies that can be used, through simple or more complex analyses, to address the needs of a diverse user community. JIMS is a multi-organization system that contains data and metadata holdings and ancillary information from the Jornada through the U.S. Long-Term Ecological Research Network (LTER) , U.S. Department of Agriculture (USDA), and Asombro Institute for Science Education (AISE, http://www.asombro.org/) as well as collaborative efforts, either through data collection and storage for other sites (e.g., BLM-Malpai Borderlands) or through development of tools to improve data access and analysis (e.g., EcoTrends: http://www.ecotrends.info; The Nature Conservancy Landscape Toolbox: http://www.landscapetoolbox.org/).

  2. Information Management System (top) Our system consists of six major components: (a) data management implementation/process, (b) management of data, spatial maps, and imagery, and the creation of and access to associated metadata, (c) formal data management protocols (d) tools and resources dedicated to harvest, document, archive, manage, and make data accessible tools to access, analyze, and download the data and metadata, (e) networking and computing services, and (f) the information management team.

    1. Data Management Implementation/Process (top) The purpose of data management for the JRN is to provide protocols and services for data collection, verification, organization, archive, and distribution. Procedures are conducted in accordance with recommendations and guidelines developed by the LTER Information Managers Committee. Data access, acknowledgement, and data management policies are located at http://jornada-www.nmsu.edu/jrndmpol.php.

      One of the biggest challenges moving towards archiving data is migrating historic and current data into formats consistent with database rules and to support geospatial analysis and mapping. Processing data files that have been collected or designed without database protocols is an enormous workload that can be minimized significantly by continued interaction with researchers and data managers throughout the data management process (Figure 1).

      Our site manager, John Anderson, acts as the liaison between researchers and the information management team. His involvement begins during the Project Design phase with the completion of the Jornada Notification of Research form by the researcher prior to the start of work; this alerts the Information Manager to the new study and potential LTER data sets. Upon initiation of a new study, the researcher completes a Project Documentation form that provides the second level of "metadata" documentation and arranges for GPS of the data collection sites. Research related forms are at http://jornada-www.nmsu.edu/site/dm/readme.php.

      In the Data Collection phase, the data manager helps researchers design field and laboratory data sheets that facilitate data entry and analysis. The investigator completes Data Set Documentation to provide the metadata that fully describe the data set. Both Project and Data set documentation are provided with the data set when it is requested or obtained from our website. JIMS data entry programs validate data as it is entered. Computer files are subjected to further verification by graphing and/or error-checking programs, and/or examination by the responsible investigator. Final quality assurance of the data rests with the investigator who submits data for inclusion in the Data Management System. Direct communication between researchers and the Information Manager, Ken Ramsey, is used to ensure the timely submission of data by researchers as required by NSF guidelines.

      Ongoing LTER Network participation includes the LTER Data Portal (MetaCat), LTER Personnel Directory, All-Site Bibliography, and Climate (ClimDB) databases as well as representation and participation at the annual Information Managers (IM) Meeting and workshops associated with expanding the capability of the site to acquire, maintain and exchange information in a timely fashion that can be shared readily with other LTER and non-LTER users.



    2. Figure 1, Data Management Process

    3. Management of Data and Metadata (top) We employ many tools in this system, including SQL Server, GIS Server, geodatabases, Drupal website content management system, data entry systems, and open-source geoportals. We are building a system that optimizes the contribution of each of these tools to the total system. We are also optimizing the roles that people are playing based on the changing system. See the section "Website, Data Catalog, and Geoportals" for an explanation of Drupal and geoportals. We are building and have tested an approach to information management that integrates tabular research data with the spatial component of where the data was collected as a complete data set package. This approach allows us to integrate data from more than one research program or project.

      A plant voucher collection is catalogued, and a database is maintained to provide reference material and documentation of JRN plant species at research sites. A more extensive voucher collection of JRN species is incorporated within the New Mexico State University Department of Biology Herbarium and New Mexico State University Range Science Herbarium. Both herbaria maintain a database of their collections with INRAM (Institute of Natural Resource Analysis and Management, http://biodiversity.inram.org/).

    4. Data protocols and metadata standards (top) Procedures are conducted in accordance with recommendations and guidelines developed by the LTER Information Managers Committee (IMC). Data access, acknowledgement, and data management policies are located at http://jornada-www.nmsu.edu/jrndmpol.php. JRN data policies are in accordance with those developed by the LTER IMC (http://www.lternet.edu/data/netpolicy.html). Compliant EML (version 2.1.0) will be produced for each data set and harvested into the LTER Data Portal.

    5. Tools and Resources Dedicated to Harvest, Document, Archive, Manage, and Provide Data Access (top) Our website offers options for users to access, query, and understand the data sets online before deciding to download. Data sets are delivered to users, as downloadable data set packages, from the data catalogs and geoportals through web queries. The data set package will include metadata files, data files (with coordinates for each data record), KMZ file (for use within Google Earth), and a shapefile (spatial representation of research sites where data was collected). We plan to have all long-term data sets in this system by the end of May 2012. All data sets are currently available online at http://jornada-www.nmsu.edu/datacat.php. The data table, as well as the spatial location (where the data were collected), are treated as integrated objects available in more than one format, such as comma-separated value (CSV) and shapefile. The CSV file with x, y coordinates can be easily added to any spreadsheet, database, GIS, or analytical software. The shapefile format can be used with most commercial and open source GIS systems.

      Data derived from LTER funding are made freely and publicly available within 2 years after collection, per NSF policy. Online availability of our LTER data is through our website, which includes a catalog of Jornada data sets. A listing of long-term data sets is available at http://jornada-www.nmsu.edu/longtermdatasets.php. Data are routinely updated online, typically within one day of receiving it. Data are designated as either "Unrestricted" and available online or "Restricted" with release authority by responsible investigator, usually within 2 days of request. Restricted data sets are those in preparation for publication or student research that is protected to allow them the opportunity to publish. A listing of restricted data sets and associated justifications and approvals from the JRN Executive Committee is available at http://jornada-www.nmsu.edu/restricteddatasets.php.


      1. Geodatabase (top) The Jornada enterprise geodatabase runs ESRI ArcSDE spatial data engine on SQL Server 2008. The geodatabase provides storage and access to JRN spatial and tabular research data holdings. The geodatabase is also used to create and manage metadata in FGDC format which is subsequently used by the JRN website and geoportal allowing users to visualize, search, and access JRN GIS and tabular data. Map and image services are created from geodatabase resources and are provided by the GIS server (ArcGIS Server for Java) to the geoportal and other web-mapping applications. This provides a visual display of the cataloged spatial data sets to the user. A geodatabase is used to integrate spatial and tabular research data. Geographic coordinates (x,y in ), as well as key and data set identifier fields, are inserted into comma separated value text files. This allows the data to be easily imported into any number of analytical software or databases.

      2. Website, Data Catalogs, and Geoportals (top) The JRN website provides access to data via a data catalog and geoportal as well as personnel information, publications, research proposals, reports, and other information about JRN and its research activities and collaborations. The website follows the LTER website design recommendations (http://im.lternet.edu/sites/im.lternet.edu/files/LTER_Web_Site_Design_and_Content_Guidelines_V1.1_0.pdf). Recently, the Jornada has moved to an open source content management system (Drupal) to host websites for all Jornada research projects and collaborations. The JRN website has been moved into this combined website and has implemented the Drupal Environmental Information Management System (DEIMS) to support the data catalogs. DEIMS was initially developed by the LTER Network Office and has been adopted by several other LTER sites (ARC, LUQ, NTL, NWT, PIE, SEV, VCR) as a common approach to making data available and for generating EML to LTER best practices, which will be harvested into the LTER Network Information System (NIS). EML 2.1.0 generated by DEIMS is automatically harvested into the current LTER Data Portal (metadata search engine).

        The Jornada has implemented ESRI open source geoportals into JIMS. The geoportals provide textual searches via keywords as well as the ability to query geographic extent and map research site locations. Although primarily developed to facilitate the distribution of spatial data sets, the geo-portal can also be used to query and deliver a wide range of products including documents, tabular data, and integration with other data portal (using web services). Registered users can save multiple search terms to revisit the site at a later date. Data providers can manually publish data sets in the portal or the geoportal can be configured to automatically publish data sets when properly formatted FGDC metadata is added or updated in a specific internal directory. The interface will also allow a user to select a bounding extent to limit or clip spatial data sets and automatically e-mail the customized files to the user as a zip file. The geoportal has the capability to deny access to restricted data sets or grant access to only selected registered users as defined by the portal administrator.

        We plan on integrating Drupal and the geoportals to allow seamless access to both systems without requiring users to login separately to both systems. Currently, most JRN EML files in the LTER Data Portal point to the specific data set location within the JRN data catalog on our old website. As GIS enabled data packages are created, the EML files are updated using Drupal to point to the data package.


    6. Networking and Computing Services (top)

      1. Network Services (top) The Jornada site offices and laboratories located at Wooton Hall on the campus of NMSU are connected to a local area network (LAN) through a firewall to the NMSU network (Gigabit Ethernet). Most computers and all servers are connected to the LAN using Gigabit Ethernet (1000 Mb). Jornada plans to increase bandwidth from the field station to the NMSU campus from 1.54 MB to 50-75 MB as soon as possible using high speed, multi-hop, point-to-point wireless radios. The increased bandwidth will support streaming data and video, and remote education (K-12) from the wireless network covering the research site (78,000+ hectares). Jornada plans to continue increasing the wireless coverage (cloud) across the research site to provide Wi-Fi and 900 MHz spread spectrum connectivity for researchers, educators, and scientific instrumentation.

      2. Computer Services (top) Jornada servers support 2 resource pools; development and production. Each resource pool supports multiple virtual servers running multiple operating systems (Linux, Windows). The resource pools are configured to provide high availability and workload balancing to ensure the servers availability 24 hours a day, 365 days a year. If one of the physical servers (hypervisors) within a pool fails or is brought down for maintenance, the virtual servers running on the server are automatically transferred to another hypervisor. To a user connected to services provided by one of the virtual servers, the server will appear to have a slight delay (15-30 seconds), but will otherwise see no apparent effect from the virtual server being transferred to another hypervisor. Workload balancing allows virtual servers to be redistributed to other hypervisors in the resource pool to ensure optimal performance in the event a hypervisor starts to slow down due to workload. Currently, the Jornada has 4 physical servers within the production resource pool and 2 in the development pool. Additionally, servers that are not virtualized provide directory services (Active Directory, LDAP), backup, and workload balancing storage for the resource pools. Server storage is centralized using a storage area network (SAN) and provides 93 TB of storage capacity. The servers and SAN are connected redundantly to allow for hardware failure without impacting server performance.


      3. Figure 2, Jornada Information Management System

      4. Data Protection (top) Multiple forms of backup are incorporated to protect data and systems from disaster and to allow for rapid recovery in case a disaster does occur. Servers and switch closets are physically secured and environmentally controlled to provide security and protection for network and server equipment. Differential backups are performed nightly on all servers and many desktop computers using a dual drive LTO 4 tape library directly attached to the SAN. Backup media is reused after 3 months. Backup media are stored off-site in case of catastrophe. Virtual server snapshots are performed prior to system upgrades or modifications to allow rapid recovery in the event these alterations produce undesired results. The data archive volume is backed up routinely to DVDs and hard drives which are stored off-site. The DVDs are not reused, but are saved indefinitely. We are exploring mechanisms to automate and schedule server snapshots with little or no additional cost. We are also exploring disk to disk backups and alternative technologies to replace our aging tape library.

    7. Information Management Team (top) Our information management team consists of four full-time staff jointly supported by JRN and USDA (Ken Ramsey: Information Manager; Jim Lenz: Network and systems administrator; Valerie LaPlante: Multimedia and Website Administrator; Scott Schrader: GeoPortal Administrator). Student employees and graduate assistants support data entry and computer programming efforts. Team member\'s skill sets complement each other with some overlap to allow for temporary absence and employee turnover.

  3. Milestones and Deliverables Relative to Current LTER Network Activities (top) Ongoing JRN participation in LTER network-wide activities includes the LTER Data Portal, All-Site Bibliography, and Climate databases as well as representation and participation at the annual Information Managers (IM) Meeting, IM Executive Committee, and NIS workshops. These activities are associated with expanding the capability of the JRN to acquire, maintain, and exchange information in a timely fashion to meet our milestones and deliverables (Fig. C2), and to share this information with other LTER and non-LTER users via the JRN website and the NIS being developed at the LNO.

    Members of the JRN IM team are active participants in the NIS development. Ken Ramsey is participating in 3 NIS development tiger teams. Ken Ramsey and John Anderson are participating in 3 cross-site IM working groups (WG) to advance efforts to prepare site data and associated metadata for inclusion in the NIS: the SensorNIS WG is developing best practices for preparing near real-time streaming sensor data; the DEIMS WG is developing a common approach for creating EML; and the GeoNIS WG is developing best practices for inclusion of GIS and remote sensing data. We continue to develop the EcoTrends website by adding datasets from > 50 sites within the US and abroad, deriving new data variables, and improving data accessibility and analytical tools. This web site was migrated from the LNO to a Jornada virtual server in LTER-V. The content was updated during this process to correct problems or omissions that had not been previously identified.

    We continue to develop the next iteration of the EcoTrends website and have dedicated 4 full-time staff and several students to this project. We plan on implementing the LTER NIS at the Jornada as soon as possible to explore integration of the new EcoTrends website with the data and metadata web services of the NIS. We are also integrating the P2ERLS website of general information (e.g., ecosystem type, long-term mean precipitation, temperature) from > 300 sites distributed globally (http://www.p2erls.net) with the EcoTrends website (http://www.ecotrends.info).

    Recently, activity and discussion within the LTER Network resulted from the Bob Robbins video illustrating problems he encountered while trying to access data from each web site. JRN responded quickly to these problems by immediately implementing a website redirect to forward users to the current data catalog page. We then created EML using DEIMS to replace JRN EML documents used to search for our datasets on the LTER Data Portal. These documents now point directly to the appropriate dataset section of the data catalog. During this process, we increased the quantity of datasets in the LTER Data Portal as well as the quality and congruency of the EML metadata. As a member of the NIS Data Portal tiger team (Ramsey), we will continue to work with the LNO and the NIS developers to ensure that the current LTER Data Portal and planned LTER NIS Data Portal allow users to more easily access JRN datasets and associated metadata.



  4. Figure 3, Milestones and Deliverables

    NOTE: The term 'NIS Ready.' indicates that the EML metadata are complete, accurately describe the related data file, and follow LTER best practices currently being defined by the LTER IMC as the NIS is developed by the LNO.

  5. LTER Network and Community Activities (top) The Jornada LTER has collaborated with several non-LTER organizations to explore the implementation of new technologies and approaches to enhance the usefulness and accessibility to JRN data, associated metadata, and ancillary information. Collaborations between the San Diego Super Computer Center (SDSC) and LTER Network Office supported a cross-site LTER workshop and SDSC GIS and a web services development workshop which resulted in a collaboration with SDSC to explore the use of web services to provide access to JRN climate data.

    Collaborations with the Evergreen State College Canopy Project (http://canopy.evergreen.edu/) and several LTER sites (BES, JRN, LUQ, SEV, SGS) explored the use of Databank, an XML template approach to database structure reuse and development with an intuitive interface and associated analytical tools. The XML templates are associated with 2D and 3D visualization and preliminary analytical tools. When a template is used to create a new research database, the associated visualization and analytical tools are available for the newly created database without further development requirements. This approach to data structure reuse and automated EML metadata documentation will be a useful tool for both scientists and information managers.

    The Shortgrass Steppes LTER (http://sgs.lternet.edu/), Sevilleta LTER (http://sev.lternet.edu/), and Jornada LTER sites expanded the initial collaboration with the Canopy Project to explore using Databank to perform cross-site analysis of grassland NPP data; the Grasslands Data Integration (GDI) . The Kruger National Park (South Africa) and the Konza Prairie LTER (http://knz.lternet.edu/) subsequently joined this cross-site collaboration effort. The collaboration has also moved from a more information management research focus to one of broader scientific interest with a focus on NPP.

    Jornada Scientists and information managers collaborated with the NMSU Bioinformatics Center and the Sapelo Island Microbial Observatory (SIMO)/ Georgia Coastal Ecosystem LTER (http://gce.lternet.edu/) to develop a database system to store and access genomic and biochemical information collected by JRN scientists. This system will include interfaces to access and utilize the data and associated metadata that will be stored with the system.

    The Jornada LTER and the Chihuahuan Desert Nature Park (CDNP) are collaborating to develop and host web sites and tools to support K-12 student ecological research activities and teacher workshops for teachers in southern New Mexico and western Texas public schools. We are initially developing an ecological glossary, Meet the Scientist web site, and the capability for teachers to upload and access data collected by students to a central JRN server. Future activities include developing classroom web pages to highlight achievements and access to the class data. The Jornada LTER will develop the databases and web sites and services to support and host the schoolyard activities conducted by CDNP.