Hotpot of R, GIS, and Statistics: March 2015

Thursday, March 26, 2015

More ways to get Landsat Data

http://landsat.gsfc.nasa.gov/?p=10221

Mar 26, 2015 • Last Thursday, Amazon Web Services (AWS) announced that it is now hosting Landsat 8 imagery on its publicly accessible Simple Storage Service (S3). With help from the White House Office of Science and Technology Policy and the U.S. Geological Survey—manager of the vast Landsat archive—AWS has made over 80,000 Landsat 8 scenes (~85 Tb worth of data) available as one of its AWS Pubic Data Sets, and hundreds of Landsat 8 scenes are being added daily—as they are collected, they are added. Each spectral band of each Landsat scene is available as a stand-alone GeoTIFF.

AWS also plans to add historic data from Landsats 1, 2, 3, 4, 5, and 7. The AWS Public Data Sets infrastructure is a “centralized repository of selected public data sets that can be integrated into AWS cloud-based applications to reduce the time and cost associated with transferring large data sets.”

Jed Sundwall, writing for the Amazon Web Services Official Blog, stated “As we said in December, we hope to accelerate innovation in climate research, humanitarian relief, and disaster preparedness efforts around the world by making Landsat data readily available near our flexible computing resources. We have committed to host up to a petabyte of Landsat data as a contribution to the White House’s Climate Data Initiative. Because the imagery is available on AWS, researchers and software developers can use any of our on-demand services to perform analysis and create new products without needing to worry about storage or bandwidth costs.”

In 2013, AWS teamed together with NASA’s Earth Exchange and USGS to offer the Landsat Global Land Survey data set (s3://nasanex/Landsat). This data set included mostly cloud free Landsat images covering the globe for four distinct time periods: the mid-1970s, and circa 1990, 2000, and 2005. The new Landsat on AWS public data set will offer all Landsat 8 data and eventually all Landsat data (up to 1 Pb).

In 2010, Google Earth Engine announced that it would host Landsat 30 m data (from 1984–2012). At that time the then-USGS Director, Marcia McNutt, wrote: “Landsat has become, over the years, a vital reference worldwide for understanding scientific issues related to land use and natural resources. With its long term historical record of the entire globe and widely recognized high quality of data, Landsat is valued all over the world as the “gold standard” of land observation.”

This past December, Secretary of the Interior, Sally Jewell, speaking of the Climate Data Initiative’s goal to share data including Landsat, said “By unleashing the power of our vast and open data resources, the Climate Data Initiative helps spark private sector innovation and will leverage resources for those on the front lines who are dealing with climate change. We are pooling into one place data from across the federal government to make it more accessible to the public and we hope our efforts will inspire other countries to follow suit.”

[repost] The rise of the ‘Data Journal’

The article is from http://figshare.com/blog/The_rise_of_the_Data_Journal_/149

Recently, we noted that 2015 seems to be the year that funders get serious about academic data. With the emergence of open data mandates, we are now talking about ‘when’, not ‘if’, the majority of academic outputs will live openly on the web. Funders, governments, and institutions are already making preparations for how this content should be best managed and preserved. But if we think about it, the three stakeholders mentioned above have not controlled the dissemination of content for the last 350 years. This has been the remit of academic publishers.

The last wave of funder mandates around open access have meant that there are some fundamental changes in the business models around academic publishing. This begs the question as to whether publishers can serve as the disseminators of academic data too. You may not be aware but publishers have been steadily releasing data journals with increasing frequency over the last 5 years. The most recent of which being Elsevier’s Data in Brief.

(Re-used with permission from http://blogs.nature.com/scientificdata/2013/07/23/scientific-data-to-complement-and-promote-public-data-repositories/)

Perhaps the most well received in this space has been Nature’s Scientific Data journal. This journal focuses on ‘Data Descriptors’, as opposed to results-based research articles. This serves as a way to preserve the datasets in a well curated manner, something that we at figshare have been exploring for some time now. To do this, they have partnered with the ISA-Tab community. Built around the ‘Investigation’ (the project context), ‘Study’ (a unit of research) and ‘Assay’ (analytical measurement) general-purpose Tabular format, ISA-Tab format helps you to provide a rich description of the experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships) so that the resulting data and discoveries are reproducible and reusable.

Interestingly, almost without fail, data journals are using existing repository infrastructure to ensure the preservation and provenance of the data is upheld. As can be seen in the schematic above, figshare is working with Scientific Data to ensure that the data of researchers is maintained and discoverable for a long time to come. Scientific Data’s early success has been enhanced with use of figshare. As a recent blog post pointed out, figshare is their most popular data repository to date, hosting data associated with about a third of their published articles. This may be a direct result of the seamless integration that the two systems have. Authors can upload data directly to figshare while submitting a Data Descriptor manuscript within Scientific Data, without leaving our manuscript submission system. Data are kept private during peer-review – shared only with the Scientific Data Editorial Board and referees – and released to the public upon publication of the Data Descriptor. As open access APCs become more commonplace, publishers are having to respond to the need for a constantly improving user interface as they are familiar with in other web platforms. We will continue working on these seamless integrations to make the researchers’ lives easier, so they can get on with their research!

Following the U.S. Government’s Office of Science and Technology Policy’s (OSTP) memorandum calling for all federal agencies funding data collection to create plans for public access to research projects in 2013, an interdisciplinary working group of domain repositories pleaded with different funding models for the repositories:

“Effective and innovative funding models are needed to ensure that research data, so vital to the scientific enterprise, will be available for the future. Funding models also need to assure equal access to data preservation and curation services regardless of the researcher's institutional affiliation.”

Previous examples of grant funded repositories such as Tranche in the Proteomics community have demonstrated the fragility of such models. As both figshare and Dryad have demonstrated, innovative funding models can work in this space. As figshare continues to provide services to publishers to help them ensure the longevity and availability of the data, we hope that more publishers follow Scientific Data’s lead and focus on what they do so well - curation. The long form article incentivises researchers to add levels of metadata and background information that are hard to elicit from machine readable metadata. This combined with the recognition associated with peer reviewed articles means that the publishing and data repository community have a lot to benefit from working together in ways like this.

Wednesday, March 18, 2015

Stack area plot with R

First, this is a very good reference webpage:
http://www.r-bloggers.com/data-mountains-and-streams-stacked-area-plots-in-r/

Basically, you can follow the post on above webpage. Here i made several explanations and further documentations for beginners.

Here is an example of this script producing stacked area plot as followed.

Wednesday, March 11, 2015

Where and how to find open data for GIS

GIS data has been and will be obtained from field survey via GPS and remote sensing(prior to which balloons, pigeons, kites, rockets, and other tools like satellites were used to take pictures).

Photointerpretation: geometrically corrected.terrain effects and tilt,relief displacement.orthophotos
image processing

However, data now is explosively accumulating and expanding, which are now built into various databases. Brief introductions will given for these ones as follows(Note: this will be subject to change once new databases or subject databases are found):

This website from duke university lists very good data sources:

http://guides.library.duke.edu/gisdata

Detailed introduction of data sources are as follows:

1. http://opendata.arcgis.com/
The ArcGIS itself launched a search engine for searching and downloading.
News report: http://www.gislounge.com/esri-launches-site-find-open-data/?utm_source=feedburner&utm_medium=twitter&utm_campaign=Feed%3A+gislounge+%28GIS+Lounge%29

2. http://geodata.grid.unep.ch/
This is from the UN Environmental Program, which is a very good data source for the global with national types.