Tuesday, April 3, 2012

CLIWOC (British, Spanish and Dutch shipping 1750-1855): Getting the data into R

on the Spatial analysis blog a nice visualisation of the major shipping route of the British, Dutch and Spanish fleet in 1750-1850 was presented recently. based on the Climatological Databases for the World's Oceans (CLIWOC). another even nicer visualisation of the same data was presented by  unconsenting. in neither case any code was provided, in particular with regards to just getting the data into a readable format. so i did some digging into the the CLIWOC home-page, with just that particularity in mind.
being a Linux Fedora user i started limiting my, albeit quick trial, with what was supposed to be a linux/unix readable format on the CLWVOC main database release page. this being CLIWOC15.Z file. that trial was not very successful. managed to read in the data into R after getting rid of some "unwanted" characters that R did not "like". however, the fields (columns supposed to be semicolon delimited) was more or less messed up. and the number of records were also less than specified on the page. most importantly the trial was based on non-reproducible code within the R-environment.
so, reluctantly i decided to have a go at the MS Access databases CLIWOC15_2000.zip source, albeit not having high hopes that i could find a solution that would work withing the Linux environment. but after some search on the web on how to read mdb format directly into R environment within Linux i stumbled across this post. In particular: "Use mdb.get() from Hmisc package to import entire tables from the database into dataframes." just what the doctor ordered. now i had the Hmisc library already installed. but I did not have the success with the mdb.get() function. reading the help file on mdb.get (?mdb.get) one "gets": "Uses the mdbtools package executables mdb-tables, mdb-schema, and mdb-export. in Debian/Ubuntu Linux run apt get install mdbtools." being a Fedora user the equivalent command to install the mdbtools is:
yum install mdbtools
with that I was ready to go. with the following  code i managed to achieve my objective of getting the CLIWOC MS Access data into R environment within the Linux framework (as well as do some very crude initial ggplot2):

require(Hmisc) # need also mdbtools, in Fedora do  >yum install mdbtools
path <- "yourworkingdirectory"
URL <-  "http://www.knmi.nl/"
PATH <- "cliwoc/download/"
FILE <- "CLIWOC15_2000.zip"
dir <- unzip(paste(path,"CLIWOC15_2000.zip",sep=""))
file <- substr(dir,3,nchar(dir))
dat <- mdb.get(file)
tmp <- dat$CLIWOC15[,c("Lon3","Lat3")]
ggplot(tmp,aes(Lon3,Lat3)) + geom_point(alpha=0.01,size=1) + coord_map() +
Created by Pretty R at inside-R.org

No comments:

Post a Comment