Monday, September 22, 2008

Extraction of Coral Features from MBES data using GRASS

In 2006 I was faced with a problem. Where I worked at the time we were acquiring large volumes of seabed information in the form of discrete multi-beam echo sounder (MBES) data for bathymetry and side scanning sonar to identify seabed features. Our vector database of classified coral outlines had not been updated based on these datasets for a few years and it was a big task to re-interpret the lot.

We were not recording backsctter intensity from the MBES and SSS (side scanning sonar) has some inherently funky geometry which makes it hard to work with and integrtate with MBES for automated feature extraction purposes (although I did some interesting work with this in 1999/2000 that I may post if I find the results). I had some previous experience with terrain analysis and thought I could lend a hand.

I had done quite a bit of work in the past using 2nd differentials of terrain (more commonly known as rugosity). If the first differential of a terrain dataset is slope - or rate of change of the vertical component over a horizontal surface (this is essentially the vertical variability or undulation of the terrain), then the second differential is "slope slope" or the rate of change of the rate of change of the vertical over the horizontal. This is essentially a measure of how rugged a surface is. The theory is that coral tends to exhibit higher rugosity than the surrounding seabed.

Another technique to find the edges of the coral is to perform "inflection" analysis. In this case you take a smoothed lower resolution version of the terrain (spline fitting is often well suited to this) and intersect it with a higher resolution model. The point at which the absolute differences between the two models is zero is often a pretty good approximation of the edge of the Coral. This method sometimes seems to work better than using the 1st differential (high slope) alone.

A combination of slope, rugosity, inflection analysis and various "grow" and "thin" filters eventually results in a fairly accurate extraction of the coral areas - although a bit of tuning is required depending on the the resolution and general properties of the input terrain model .

I had worked on similar problems in the past using custom filters and kernels in ER Mapper and also using ESRI technology in Spatial Analyst. As I was mucking around with GRASS under Linux at the time I decided to use this environment instead. I was extremely impressed! I found it extremely powerful that I could write bash scripts to automate and tie together many processes.

After successful extraction of coral features in GRASS we ran vectorisation of the results in Spatial Analyst, performed some sinuosity filters to remove jagged effects and then updated the coral outlines data in GIS.

The screen capture below shows the results against a dataset over varying quality. It should be noted that thus was taken part way through the extraction process and does not represent the final classification. Unfortunately I could not find a screen cap of what was finally extracted nor could I find the bash scripts we used in GRASS.... I'll keep looking.

Sunday, September 21, 2008

Linux, The Birth of the Raster Engine and Seismic Matching Algorithms

In 2008 I was getting re-acquainted with *nix systems all over again. My primary computer at home was running Ubuntu Linux and OpenSolaris (UNIX) and I was getting into all the great open source software that's out there, for me that meant:
  • C++ / Java on Eclipse with K Develop IDEs
  • NET programming on Mono
  • MySQL database.PHP web programming
  • ... and of course the mighty GRASS GIS (which is one of my favourite serious GISs).
I even converted my entire beowulf cluster (of only 4 nodes) from Windows Server 2003 to a Ubuntu variant (Mint).

I've always been the type of person who takes work home - and I usually work a few hours in the evenings and on the weekend just tidying things up. I guess it's OK when you're excited about what you're doing and / or you don't have a baby to look after (as my wife and I do now :) ). I don't feel right if I have not laboured over an interesting problem until I am completely knackered or the accompanying glass/es of wine haven't made me give up.

It was during this time that for fun I was writing my own raster engine in C++ - the idea was that I would get used to developing for Linux and also dust the cobwebs off my C++ skills / remember how to use pointers! The engine could read in and out ASCII XYZ / CSVs, ESRI ASCII and ZYCOR ASCII formats as well as address cells by coordinate etc - the usual stuff you would need from a basic raster manipulation library. I also hoped that by targeting Linux I could later use this engine for batch or parallel processing on a cluster of 64bit machines - massively increasing the amount of memory I could address and therefore the size of rasters I could manipulate. I compiled on my Ubuntu OS using g++ (GCC) and during performance testing the engine easily beat the ESRI engine and even the GRASS engine (by a factor exceeding 100).

Fortunately a work problem presented itself for which my newly created raster engine would be perfectly suited...

The geophysicist in our group had noticed that some old seismic data was not correctly positioned (grossly). Throughout the datasets he was looking at it seemed that a number of lines did not match when he compared the seabed picks with real bathymetric data (where we were working at the time this was a fairly reliable method of checking because the seabed had a very distinct profile due to the presence of corals). He asked me if there was an automated way to perform this comparison and if it were possible to even determine what the correct translation and rotation of the seismic data might be.

I quickly realised that my C++ raster engine could be of use in this case. It was perfect at loading up very large surfaces and could easily generate a profile given a series of coordinates. It was also possible to compare a given XYZ profile with one generated from my engine to determine if the two profiles were somehow similar. I also formulated a method whereby iterative coordinate translations and rotations were performed on the given profile until a closer match was found with the seabed profile. It took just over half an hour to run all permutations for a given profile.

Fig 1: On top is the seabed profile generated from the raster engine (over a nice piece of coral) and below the absolute vertical component residuals from the comparison with the input profile from seismic data. A negative match is obvious.

It was at this point that I knew that if we were to process this across the entire seismic dataset then we would need to lease time on the dedicated batch processing compute cluster.
Where I worked we had 2 dedicated compute clusters and one was based on Red Hat Enterprise Linux (RHEL) (I'm not so sure what the oher ran but I think it is also now RHEL-based). I used RH (before they went enterprise) when I was at university so it wasn't too dificult to move to KDE from the Ubuntu GNOME environment. We recompiled the raster engine for RHEL and scheduled the jobs for processing on the cluster.

There's no flashy graphics in this post - but almost over-night we managed to process all of the seismic lines and determine at least which ones had dodgy positioning. It turned out that the translation / rotation matching algorithm was only mildly successful (I'm sure given time I could improve it) - we had less than 20% success finding matches using the approach adopted but in most cases the determined match made sense.