Thursday, March 26, 2015

[repost] The rise of the ‘Data Journal’

The article is from http://figshare.com/blog/The_rise_of_the_Data_Journal_/149

Recently, we noted that 2015 seems to be the year that funders get serious about academic data. With the emergence of open data mandates, we are now talking about ‘when’, not ‘if’, the majority of academic outputs will live openly on the web. Funders, governments, and institutions are already making preparations for how this content should be best managed and preserved. But if we think about it, the three stakeholders mentioned above have not controlled the dissemination of content for the last 350 years. This has been the remit of academic publishers.

The last wave of funder mandates around open access have meant that there are some fundamental changes in the business models around academic publishing. This begs the question as to whether publishers can serve as the disseminators of academic data too. You may not be aware but publishers have been steadily releasing data journals with increasing frequency over the last 5 years. The most recent of which being Elsevier’s Data in Brief.




Perhaps the most well received in this space has been Nature’s Scientific Data journal. This journal focuses on ‘Data Descriptors’, as opposed to results-based research articles. This serves as a way to preserve the datasets in a well curated manner, something that we at figshare have been exploring for some time now. To do this, they have partnered with the ISA-Tab community. Built around the ‘Investigation’ (the project context), ‘Study’ (a unit of research) and ‘Assay’ (analytical measurement) general-purpose Tabular format, ISA-Tab format helps you to provide a rich description of the experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships) so that the resulting data and discoveries are reproducible and reusable.

Interestingly, almost without fail, data journals are using existing repository infrastructure to ensure the preservation and provenance of the data is upheld. As can be seen in the schematic above, figshare is working with Scientific Data to ensure that the data of researchers is maintained and discoverable for a long time to come. Scientific Data’s early success has been enhanced with use of figshare. As a recent blog post pointed out, figshare is their most popular data repository to date, hosting data associated with about a third of their published articles. This may be a direct result of the seamless integration that the two systems have. Authors can upload data directly to figshare while submitting a Data Descriptor manuscript within Scientific Data, without leaving our manuscript submission system. Data are kept private during peer-review – shared only with the Scientific Data Editorial Board and referees – and released to the public upon publication of the Data Descriptor. As open access APCs become more commonplace, publishers are having to respond to the need for a constantly improving user interface as they are familiar with in other web platforms. We will continue working on these seamless integrations to make the researchers’ lives easier, so they can get on with their research!

Following the U.S. Government’s Office of Science and Technology Policy’s (OSTP) memorandum calling for all federal agencies funding data collection to create plans for public access to research projects in 2013, an interdisciplinary working group of domain repositories pleaded with different funding models for the repositories:

“Effective and innovative funding models are needed to ensure that research data, so vital to the scientific enterprise, will be available for the future. Funding models also need to assure equal access to data preservation and curation services regardless of the researcher's institutional affiliation.”

Previous examples of grant funded repositories such as Tranche in the Proteomics community have demonstrated the fragility of such models. As both figshare and Dryad have demonstrated, innovative funding models can work in this space. As figshare continues to provide services to publishers to help them ensure the longevity and availability of the data, we hope that more publishers follow Scientific Data’s lead and focus on what they do so well -  curation. The long form article incentivises researchers to add levels of metadata and background information that are hard to elicit from machine readable metadata. This combined with the recognition associated with peer reviewed articles means that the publishing and data repository community have a lot to benefit from working together in ways like this.

No comments:

Post a Comment