Sunday, September 21, 2008

Linux, The Birth of the Raster Engine and Seismic Matching Algorithms

In 2008 I was getting re-acquainted with *nix systems all over again. My primary computer at home was running Ubuntu Linux and OpenSolaris (UNIX) and I was getting into all the great open source software that's out there, for me that meant:
  • C++ / Java on Eclipse with K Develop IDEs
  • NET programming on Mono
  • MySQL database.PHP web programming
  • ... and of course the mighty GRASS GIS (which is one of my favourite serious GISs).
I even converted my entire beowulf cluster (of only 4 nodes) from Windows Server 2003 to a Ubuntu variant (Mint).

I've always been the type of person who takes work home - and I usually work a few hours in the evenings and on the weekend just tidying things up. I guess it's OK when you're excited about what you're doing and / or you don't have a baby to look after (as my wife and I do now :) ). I don't feel right if I have not laboured over an interesting problem until I am completely knackered or the accompanying glass/es of wine haven't made me give up.

It was during this time that for fun I was writing my own raster engine in C++ - the idea was that I would get used to developing for Linux and also dust the cobwebs off my C++ skills / remember how to use pointers! The engine could read in and out ASCII XYZ / CSVs, ESRI ASCII and ZYCOR ASCII formats as well as address cells by coordinate etc - the usual stuff you would need from a basic raster manipulation library. I also hoped that by targeting Linux I could later use this engine for batch or parallel processing on a cluster of 64bit machines - massively increasing the amount of memory I could address and therefore the size of rasters I could manipulate. I compiled on my Ubuntu OS using g++ (GCC) and during performance testing the engine easily beat the ESRI engine and even the GRASS engine (by a factor exceeding 100).

Fortunately a work problem presented itself for which my newly created raster engine would be perfectly suited...

The geophysicist in our group had noticed that some old seismic data was not correctly positioned (grossly). Throughout the datasets he was looking at it seemed that a number of lines did not match when he compared the seabed picks with real bathymetric data (where we were working at the time this was a fairly reliable method of checking because the seabed had a very distinct profile due to the presence of corals). He asked me if there was an automated way to perform this comparison and if it were possible to even determine what the correct translation and rotation of the seismic data might be.

I quickly realised that my C++ raster engine could be of use in this case. It was perfect at loading up very large surfaces and could easily generate a profile given a series of coordinates. It was also possible to compare a given XYZ profile with one generated from my engine to determine if the two profiles were somehow similar. I also formulated a method whereby iterative coordinate translations and rotations were performed on the given profile until a closer match was found with the seabed profile. It took just over half an hour to run all permutations for a given profile.

Fig 1: On top is the seabed profile generated from the raster engine (over a nice piece of coral) and below the absolute vertical component residuals from the comparison with the input profile from seismic data. A negative match is obvious.

It was at this point that I knew that if we were to process this across the entire seismic dataset then we would need to lease time on the dedicated batch processing compute cluster.
Where I worked we had 2 dedicated compute clusters and one was based on Red Hat Enterprise Linux (RHEL) (I'm not so sure what the oher ran but I think it is also now RHEL-based). I used RH (before they went enterprise) when I was at university so it wasn't too dificult to move to KDE from the Ubuntu GNOME environment. We recompiled the raster engine for RHEL and scheduled the jobs for processing on the cluster.

There's no flashy graphics in this post - but almost over-night we managed to process all of the seismic lines and determine at least which ones had dodgy positioning. It turned out that the translation / rotation matching algorithm was only mildly successful (I'm sure given time I could improve it) - we had less than 20% success finding matches using the approach adopted but in most cases the determined match made sense.

No comments:

Post a Comment