Sunday, July 1, 2012

FAO statistical areas in Google Earth

Some time ago I did a blog describing how to get ICES and NAFO statistical areas, originally as shapefiles into Google Earth readable format (ICES, NAFO). These areas are the primary fisheries areas upon which nominal fisheries catch statistics have been reported in the past (since 1905 when it comes to ICES).
A colleague of mine asked me if I was aware of an analogous format for the global FAO fisheries statistical areas. After some search I found indeed that FAO provides these statistical areas in a Google Earth readable format. But a little cumbersome for my taste. Luckily, the shapefiles are easily available at the FAO site. So here is my take on things:

################################################################################
# FAO statistical areas in Google Earth format
require(rgdal)
require(plotKML)
setwd("~/Dropbox/blogspotScripts/20120701FAOareas/") # your 'native' directory
url <- 'http://www.fao.org/geonetwork/srv/en/resources.get?id=31627&fname=fa_.zip&access=private'
download.file(url,destfile="~/Downloads/tmp.zip") # your 'native' directory
unzip("~/Downloads/tmp.zip") # your 'native' directory
fao <- readOGR(".","fa_")
setwd("~/Dropbox/Public/") # your 'native' directory
 
plot(fao) # a peek preview in R
 
# here one could simply do, via function from the plotKML package
kml(fao,file="tmpFAO.kml") # open your tmp.kml in GE to have a view
# but that stuff is no good
 
# to put some order to things, etc.
names(fao) <- tolower(names(fao))
dat <- as.data.frame(fao)
i <- data.frame(str_locate(fao$f_subarea,"_"))
i <- !is.na(i[,1])
fao$f_subarea[i]  <- str_sub(fao$f_subarea[i],2)
fao$f_division[i] <- str_sub(fao$f_division[i],2)
fao$f_subdivis[i] <- str_sub(fao$f_subdivis[i],2)
fao$name <- ifelse(!is.na(fao$f_subunit),fao$f_subunit,
                   ifelse(!is.na(fao$f_subdivis),fao$f_subdivis,
                          ifelse(!is.na(fao$f_division),fao$f_division,
                                 ifelse(!is.na(fao$f_subarea),fao$f_subarea,
                                        fao$f_area))))
fao <- as(fao,"SpatialLinesDataFrame")
fao$id <- as.numeric(fao$f_area)
area <- sort(unique(fao$f_area)) # put the stuff in order of the major areas
# again some function of the plotKML packages put in use
kml_open("fao.kml")
# filling up the russian doll
for (i in 1:length(area)) {
  tmp <- fao[fao$f_area %in% area[i],]
  kml_layer.SpatialLines(tmp,labels=tmp$name,folder.name = area[i])
}
kml_close("fao.kml")
kml_compress("fao.kml")  # create a compressed (KMZ) file
Created by Pretty R at inside-R.org

Did not quite get want I really wanted (plotKML is really in its infancy, just released to cran last week. But a very promising package). So played a little around (non reproducible code) to end up, preliminary with this, the following being a snapshot:

Tuesday, May 29, 2012

Google Earth and ocean depth contours

Been playing around in R for a while and then a bit in the GIS environment - within R of course. Do not know very much about GIS, but know what I want: Looking at GIS fisheries data across various scales (macro to global). Now we have myriads of websites that provide a view, on a 'data' on a scale. But herein lies the limitation, it is piecemeal information. Now what GIS software medium actually operates on all scales? Who else to call than Google Earth?
Here is an snapshot of the depth contours for from 0 to 1500 m by 100 meter intervals for the whole of the North Atlantic. Original data is from General Bathymetric Chart of the Oceans. All that was needed was some lines of code in R to convert point values to contour and then generate the kml-file. And adding on top of that the ICES statistical rectangles as a layer, coloured in red.

Zooming in or out in Google Earth environment is a breeze. An example on the macroscale, showing the Porupine bank is shown below.

Wandering if one should not use the Google Earth as the base platform to publish all fisheries and management information. Both the global as well as the local.

Monday, May 28, 2012

ICES and NAFO statistical areas viewed in Google Earth

On the ICES Spatial Facility webpage one can obtain a view of the ICES statistical areas. Now since the projection used is more than a bit misleading I was wandering how these statistical areas would look like in Google Earth. Thankfully the shapefiles associated with these rectangles are available to download  from the web page. What follows is less than a bare bone code to convert the shape files into kml files. Except that a bit of a thinning of the spatial codes was needed because the original shapefiles are very detailed for some of the coastal areas resulting in very nonsmoothed zooming etc. in Google Earth. While doing this I also included the NAFO areas. Since I also wanted to see the EEZ on top of this I include a code to convert EEZ shapefiles to kml. The EEZ data were downloaded from the VLIZ Maritime Boundaries Geodatabase.

Before running the code below, one needs to download the necessary data onto the local machine and extract the data from the zipped file into a working directory (not needed for NAFO areas - is included in the code below). Once this has been done the following code was used to convert the data into Google Earth format:

setwd("yourWorkingDirectory")
library(maptools)
# gpclibPermit() # may be needed
library(rgdal)
# ices areas
ices_areas = readOGR('ices_areas.shp','ices_areas')
ices_areas <- thinnedSpatialPoly(ices_areas,tolerance=0.04) # takes some time
writeOGR(ices_areas,'ices_areas.kml','ices_areas',driver='KML')
# world eez
eez <- readOGR('World_EEZ_v6_1_simpliefiedcoastlines_20110512.shp',
               'World_EEZ_v6_1_simpliefiedcoastlines_20110512')
writeOGR(eez, "eez.kml", "eez", driver="KML")
# nafo areas
download.file('http://www.nafo.int/about/overview/gis/Divisions.zip',
              destfile='NAFOdivisions.zip')
unzip('NAFOdivisions')
nafo_areas <- readOGR('NAFOdivisions.shp','NAFOdivisions')
writeOGR(nafo_areas,'nafo_areas.kml','nafo_areas',driver='KML')
Created by Pretty R at inside-R.org

Now one only needs to open the  kml files (ices_areas.kml, nafo_area.kml and eez.kml, located in the working directory) in Google Earth. If there are problems with downloading or running the scripts I provide the ices_areas.kmlnafo_areas.kml and eez.kml via my Public Dropbox space (there is also a depth contour in the E-Greenland, Iceland Faero Is. region, see nwwgDepth.kml)
As said, this is very much a bare bone attempt but at least describes the essence of getting a GE view of the ICES and NAFO statistical areas. Zooming in and out is a breeze and by clicking on an area some statistics show up. A snapshot of the broad view as presented in the projection on the ices page vs. the one provided in Google Earth are shown for comparisons. The global EEZ are a bonus :-)




Friday, April 13, 2012

ICES maturity: datamining, MS Access, R, plyr, reshape2 and ggplot2

ok, the title is a bit bloated

but a very short code:
download.file("http://ices.dk/datacentre/StdGraphDB/FishStockDB.mdb",
              "FishStockDB.mdb")
require(Hmisc) # need also mdbtools, in Fedora do  >yum install mdbtools
FSDB <- mdb.get("FishStockDB.mdb")
require(plyr)
dat <- ddply(FSDB$Yieldrecruit[,c("FishStock","Age","Mat","F")],
             c("FishStock"),transform,sel=F/max(F))
require(reshape2)
dat <- melt(dat[,c("FishStock","Age","Mat","sel")],id.vars=c("FishStock","Age"))
dat$variable <- ifelse(dat$variable %in% "Mat","Maturity","Selection")
require(ggplot2)
ggplot(dat,aes(Age,value,colour=variable)) + 
  geom_line() + 
  facet_wrap(~ FishStock) +
  xlim(0,10)
ggsave("ICESmaturity.png")
is behind this picture.

and it demonstrates how to:
download data
access MS Access mdb files via R
use plyr
use reshape2
use ggplot2

the resulting graph poses on the other hand, myriads of fisheries related questions. be they biological and/or management. like the question if there is a pattern between the sync/out-of-sync in the fisheries selection pattern and the biological pattern (here maturity) and the state of the stock/fisheries? leave that  for somebody else to determine.

Tuesday, April 3, 2012

CLIWOC (British, Spanish and Dutch shipping 1750-1855): Getting the data into R

on the Spatial analysis blog a nice visualisation of the major shipping route of the British, Dutch and Spanish fleet in 1750-1850 was presented recently. based on the Climatological Databases for the World's Oceans (CLIWOC). another even nicer visualisation of the same data was presented by  unconsenting. in neither case any code was provided, in particular with regards to just getting the data into a readable format. so i did some digging into the the CLIWOC home-page, with just that particularity in mind.
being a Linux Fedora user i started limiting my, albeit quick trial, with what was supposed to be a linux/unix readable format on the CLWVOC main database release page. this being CLIWOC15.Z file. that trial was not very successful. managed to read in the data into R after getting rid of some "unwanted" characters that R did not "like". however, the fields (columns supposed to be semicolon delimited) was more or less messed up. and the number of records were also less than specified on the page. most importantly the trial was based on non-reproducible code within the R-environment.
so, reluctantly i decided to have a go at the MS Access databases CLIWOC15_2000.zip source, albeit not having high hopes that i could find a solution that would work withing the Linux environment. but after some search on the web on how to read mdb format directly into R environment within Linux i stumbled across this post. In particular: "Use mdb.get() from Hmisc package to import entire tables from the database into dataframes." just what the doctor ordered. now i had the Hmisc library already installed. but I did not have the success with the mdb.get() function. reading the help file on mdb.get (?mdb.get) one "gets": "Uses the mdbtools package executables mdb-tables, mdb-schema, and mdb-export. in Debian/Ubuntu Linux run apt get install mdbtools." being a Fedora user the equivalent command to install the mdbtools is:
yum install mdbtools
with that I was ready to go. with the following  code i managed to achieve my objective of getting the CLIWOC MS Access data into R environment within the Linux framework (as well as do some very crude initial ggplot2):

require(Hmisc) # need also mdbtools, in Fedora do  >yum install mdbtools
path <- "yourworkingdirectory"
URL <-  "http://www.knmi.nl/"
PATH <- "cliwoc/download/"
FILE <- "CLIWOC15_2000.zip"
download.file(paste(URL,PATH,FILE,sep=""),
              paste(path,"CLIWOC15_2000.zip",sep=""))
dir <- unzip(paste(path,"CLIWOC15_2000.zip",sep=""))
file <- substr(dir,3,nchar(dir))
dat <- mdb.get(file)
tmp <- dat$CLIWOC15[,c("Lon3","Lat3")]
require(ggplot2)
ggplot(tmp,aes(Lon3,Lat3)) + geom_point(alpha=0.01,size=1) + coord_map() +
  ylim(-90,90)
 
Created by Pretty R at inside-R.org


Thursday, January 5, 2012

getting ICES 1903-1949 catch statistics into R


in the last post i showed how to get the nominal north east atlantic landings data from 1950 onwards into a usable format in R. this time around a script that shows how to import zipped excel files from 1903 to 1949 is provided. finalizing with a ggplot of the total reported landings over the period.

again the data is available at the web site (http://www.ices.dk/fish/CATChSTATISTICS.asp) as excel files. the twist is that the files are stored in a zipped format and there are separate excel files for each country. and within each country file there is a separate worksheet for each of the years. within each worksheet the data are again stored in a wide format, the species recorded in the landings is by rows and the area fished is by columns. the number of rows (species recorded) and the number of columns (area fished) differs among different worksheet. and then there are some other nasties here and there in the worksheets, something that took a little trial and error to circumvent in the final script provided.

this is what the final work looked like:
################################################################################
# stuff needed
require(gdata)
require(reshape2)
require(stringr)
require(plyr)

################################################################################
# the source
path <- "YOURDIRECTORY"
URL <-  "http://www.ices.dk/"
PATH <- "fish/statlant/"



################################################################################
# Downloading and extraction of 1903 to 1949 data
FILE <- "ICES1903-49.zip"
download.file(paste(URL,PATH,FILE,sep=""),
              paste(path,"ICES19030-49.zip",sep=""))
dir <- unzip(paste(path,"ICES19030-49.zip",sep=""))

files <- sort(dir[grep("xls",dir)])
files <- substr(files,3,nchar(files))
cntr <- substr(files,1,3)

dat <- NULL
for (i in 1:length(cntr)) {
  print(cntr[i])
for (j in c(2:48)) {
  print(j+1901)
  if (i == 1 & j == 44)  {                     # problem with header character
    tmp <- read.xls(files[i],sheet=j,strip.white=T,stringsAsFactors=F,
                  header=F,skip=1)
    names(tmp) <- c("species","alpha","IVc","IV","VIIa,f","VIId,e","UNK","Total")
  } else {
  tmp <- read.xls(files[i],sheet=j,strip.white=T,stringsAsFactors=F,blank.lines.skip = TRUE)
  }
  if(i == 19 & j == 25) tmp <- tmp[1,1:2]    # Russia 1926 (empty spaces)
  #print(tmp)
  if(ncol(tmp)>1 & nrow(tmp)> 1) {
    rowTotal <- grep("Total",tmp[,1]) # find the location of row Total
    colTotal <- grep("Total",names(tmp)) # find the location of column Total
    if(i == 6  & j == 22) {rowTotal <- 22 ; colTotal <- 8} # France  1923
    if(i == 12 & j == 7)  {rowTotal <- 18 ; colTotal <- 7} # Ireland 1908
    tmp <- tmp[1:(rowTotal-1),1:(colTotal-1)] # exclude anything below and to
                                              #  the right of Total, including the Total
    names(tmp)[1:2] <- c("species","alfa")    # the species acronym column is in most cases left blank
                                        # in some cases it is however names something odd
    tmp <- melt(tmp,id.var=c("species","alfa"))   # turn into long format (kind of like inverse of "Pivot table"
    tmp$variable <- as.character(tmp$variable)
    tmp$value <- str_replace_all(tmp$value,",","")
    tmp$value <- ifelse(tmp$value %in% "-",0,tmp$value)
    tmp$value <- ifelse(tmp$value %in% ".",NA,tmp$value)
    tmp$value <- ifelse(tmp$value %in% "<0.5",0.25,tmp$value)
    tmp$value <- ifelse(tmp$value %in% "",NA,tmp$value)
    tmp$value <- as.numeric(tmp$value)
    tmp$year <- rep(j+1901,nrow(tmp))
    tmp$cUN2 <- cntr[i]
    dat <- rbind(dat,tmp)
  }
} # end loop through sheets (years/worksheet)
} # end loop through countries (xls file)
require(ggplot2)
ggplot(dat,aes(factor(year),weight=value/1e6))+ geom_bar() + xlab("") +
  ylab("Nominal landings [million tonnes]")



the focus here has been on just getting the data into R. more work is needed, in particular when it comes to combining the data from 1903-1949 and 1950-2010. hope i will have some time to work and report on that later.




Monday, December 26, 2011

getting ICES 1950-2010 catch statistics into R

the International Council for the Exploration of the Sea (ICES) has been compiling nominal annual landings statistics by country and division for the north east atlantic since 1904. the data is available at http://www.ices.dk/fish/CATChSTATISTICS.asp.

the data from 1950 onwards is either available in the archaic FishStat Plus format or in an MS Excel format. in both cases the format of the data are in a wide format, where the country, species and division is listed in rows and the year is in column. what follows is a script in R that shows how the data can be extracted directly from the website and then put in a long format, more suitable for further exploration.

# required library
require(gdata)
require(reshape2)
require(stringr)
options(stringsAsFactors=F)

# downloading and extract
temp <- tempfile()
download.file("http://ices.dk/fish/statlant/ICES1950-2010.zip",temp)
dat <- read.xls(paste(unzip(temp, "ICES_1950-2010.xls"),sep=""),
                sheet=1,strip.white=T)
unlink(temp)

# some cleaning
names(dat) <- tolower(names(dat))
dat <- melt(dat,c("country","species","division"))
dat$variable <- as.character(dat$variable)
dat$variable <- as.integer(str_sub(dat$variable,2))
dat$value <- str_replace_all(dat$value,",","")
dat$value <- ifelse(dat$value %in% "-",0,dat$value)
dat$value <- ifelse(dat$value %in% ".",NA,dat$value)
dat$value <- ifelse(dat$value %in% "<0.5",0.25,dat$value)
dat$value <- ifelse(dat$value %in% "",NA,dat$value)
dat$value <- as.numeric(dat$value)
names(dat) <- c("country","species","descr","year","landings")
dat <- dat[!is.na(dat$landings),]
dat <- dat[dat$landings > 0,]
ices.1950.2010 <- dat
rm(dat,temp)


the data frame ices.1950.2010 now contains 5 variables, i.e. the country, the species, descr (the area fished), the year and the landings in tonnes. some initial scoping of the data, like the total nominal landings by year can now be simply plotted by e.g.
# a simple plot
require(ggplot2)
ggplot(ices.1950.2010,aes(year,weight=landings/1e6)) +
  geom_bar(binwidth=1) +
  ylab("Nominal landings (million metric tonnes)")


Further exploration of the data, e.g. landings by particular species and/or country in the format provided is a breeze. However landings by particular are (division, area and subdivision) require further manipulation on the field/variable "descr". more on that later.

einar