Friday, April 13, 2012

ICES maturity: datamining, MS Access, R, plyr, reshape2 and ggplot2

ok, the title is a bit bloated

but a very short code:
download.file("http://ices.dk/datacentre/StdGraphDB/FishStockDB.mdb",
              "FishStockDB.mdb")
require(Hmisc) # need also mdbtools, in Fedora do  >yum install mdbtools
FSDB <- mdb.get("FishStockDB.mdb")
require(plyr)
dat <- ddply(FSDB$Yieldrecruit[,c("FishStock","Age","Mat","F")],
             c("FishStock"),transform,sel=F/max(F))
require(reshape2)
dat <- melt(dat[,c("FishStock","Age","Mat","sel")],id.vars=c("FishStock","Age"))
dat$variable <- ifelse(dat$variable %in% "Mat","Maturity","Selection")
require(ggplot2)
ggplot(dat,aes(Age,value,colour=variable)) + 
  geom_line() + 
  facet_wrap(~ FishStock) +
  xlim(0,10)
ggsave("ICESmaturity.png")
is behind this picture.

and it demonstrates how to:
download data
access MS Access mdb files via R
use plyr
use reshape2
use ggplot2

the resulting graph poses on the other hand, myriads of fisheries related questions. be they biological and/or management. like the question if there is a pattern between the sync/out-of-sync in the fisheries selection pattern and the biological pattern (here maturity) and the state of the stock/fisheries? leave that  for somebody else to determine.

Tuesday, April 3, 2012

CLIWOC (British, Spanish and Dutch shipping 1750-1855): Getting the data into R

on the Spatial analysis blog a nice visualisation of the major shipping route of the British, Dutch and Spanish fleet in 1750-1850 was presented recently. based on the Climatological Databases for the World's Oceans (CLIWOC). another even nicer visualisation of the same data was presented by  unconsenting. in neither case any code was provided, in particular with regards to just getting the data into a readable format. so i did some digging into the the CLIWOC home-page, with just that particularity in mind.
being a Linux Fedora user i started limiting my, albeit quick trial, with what was supposed to be a linux/unix readable format on the CLWVOC main database release page. this being CLIWOC15.Z file. that trial was not very successful. managed to read in the data into R after getting rid of some "unwanted" characters that R did not "like". however, the fields (columns supposed to be semicolon delimited) was more or less messed up. and the number of records were also less than specified on the page. most importantly the trial was based on non-reproducible code within the R-environment.
so, reluctantly i decided to have a go at the MS Access databases CLIWOC15_2000.zip source, albeit not having high hopes that i could find a solution that would work withing the Linux environment. but after some search on the web on how to read mdb format directly into R environment within Linux i stumbled across this post. In particular: "Use mdb.get() from Hmisc package to import entire tables from the database into dataframes." just what the doctor ordered. now i had the Hmisc library already installed. but I did not have the success with the mdb.get() function. reading the help file on mdb.get (?mdb.get) one "gets": "Uses the mdbtools package executables mdb-tables, mdb-schema, and mdb-export. in Debian/Ubuntu Linux run apt get install mdbtools." being a Fedora user the equivalent command to install the mdbtools is:
yum install mdbtools
with that I was ready to go. with the following  code i managed to achieve my objective of getting the CLIWOC MS Access data into R environment within the Linux framework (as well as do some very crude initial ggplot2):

require(Hmisc) # need also mdbtools, in Fedora do  >yum install mdbtools
path <- "yourworkingdirectory"
URL <-  "http://www.knmi.nl/"
PATH <- "cliwoc/download/"
FILE <- "CLIWOC15_2000.zip"
download.file(paste(URL,PATH,FILE,sep=""),
              paste(path,"CLIWOC15_2000.zip",sep=""))
dir <- unzip(paste(path,"CLIWOC15_2000.zip",sep=""))
file <- substr(dir,3,nchar(dir))
dat <- mdb.get(file)
tmp <- dat$CLIWOC15[,c("Lon3","Lat3")]
require(ggplot2)
ggplot(tmp,aes(Lon3,Lat3)) + geom_point(alpha=0.01,size=1) + coord_map() +
  ylim(-90,90)
 
Created by Pretty R at inside-R.org