In 2006 I was faced with a problem. Where I worked at the time we were acquiring large volumes of seabed information in the form of discrete multi-beam echo sounder (MBES) data for bathymetry and side scanning sonar to identify seabed features. Our vector database of classified coral outlines had not been updated based on these datasets for a few years and it was a big task to re-interpret the lot.
We were not recording backsctter intensity from the MBES and SSS (side scanning sonar) has some inherently funky geometry which makes it hard to work with and integrtate with MBES for automated feature extraction purposes (although I did some interesting work with this in 1999/2000 that I may post if I find the results). I had some previous experience with terrain analysis and thought I could lend a hand.
I had done quite a bit of work in the past using 2nd differentials of terrain (more commonly known as rugosity). If the first differential of a terrain dataset is slope - or rate of change of the vertical component over a horizontal surface (this is essentially the vertical variability or undulation of the terrain), then the second differential is "slope slope" or the rate of change of the rate of change of the vertical over the horizontal. This is essentially a measure of how rugged a surface is. The theory is that coral tends to exhibit higher rugosity than the surrounding seabed.
Another technique to find the edges of the coral is to perform "inflection" analysis. In this case you take a smoothed lower resolution version of the terrain (spline fitting is often well suited to this) and intersect it with a higher resolution model. The point at which the absolute differences between the two models is zero is often a pretty good approximation of the edge of the Coral. This method sometimes seems to work better than using the 1st differential (high slope) alone.
A combination of slope, rugosity, inflection analysis and various "grow" and "thin" filters eventually results in a fairly accurate extraction of the coral areas - although a bit of tuning is required depending on the the resolution and general properties of the input terrain model .
I had worked on similar problems in the past using custom filters and kernels in ER Mapper and also using ESRI technology in Spatial Analyst. As I was mucking around with GRASS under Linux at the time I decided to use this environment instead. I was extremely impressed! I found it extremely powerful that I could write bash scripts to automate and tie together many processes.
After successful extraction of coral features in GRASS we ran vectorisation of the results in Spatial Analyst, performed some sinuosity filters to remove jagged effects and then updated the coral outlines data in GIS.
The screen capture below shows the results against a dataset over varying quality. It should be noted that thus was taken part way through the extraction process and does not represent the final classification. Unfortunately I could not find a screen cap of what was finally extracted nor could I find the bash scripts we used in GRASS.... I'll keep looking.
Monday, September 22, 2008
Sunday, September 21, 2008
Linux, The Birth of the Raster Engine and Seismic Matching Algorithms
In 2008 I was getting re-acquainted with *nix systems all over again. My primary computer at home was running Ubuntu Linux and OpenSolaris (UNIX) and I was getting into all the great open source software that's out there, for me that meant:
I've always been the type of person who takes work home - and I usually work a few hours in the evenings and on the weekend just tidying things up. I guess it's OK when you're excited about what you're doing and / or you don't have a baby to look after (as my wife and I do now :) ). I don't feel right if I have not laboured over an interesting problem until I am completely knackered or the accompanying glass/es of wine haven't made me give up.
It was during this time that for fun I was writing my own raster engine in C++ - the idea was that I would get used to developing for Linux and also dust the cobwebs off my C++ skills / remember how to use pointers! The engine could read in and out ASCII XYZ / CSVs, ESRI ASCII and ZYCOR ASCII formats as well as address cells by coordinate etc - the usual stuff you would need from a basic raster manipulation library. I also hoped that by targeting Linux I could later use this engine for batch or parallel processing on a cluster of 64bit machines - massively increasing the amount of memory I could address and therefore the size of rasters I could manipulate. I compiled on my Ubuntu OS using g++ (GCC) and during performance testing the engine easily beat the ESRI engine and even the GRASS engine (by a factor exceeding 100).
Fortunately a work problem presented itself for which my newly created raster engine would be perfectly suited...
The geophysicist in our group had noticed that some old seismic data was not correctly positioned (grossly). Throughout the datasets he was looking at it seemed that a number of lines did not match when he compared the seabed picks with real bathymetric data (where we were working at the time this was a fairly reliable method of checking because the seabed had a very distinct profile due to the presence of corals). He asked me if there was an automated way to perform this comparison and if it were possible to even determine what the correct translation and rotation of the seismic data might be.
I quickly realised that my C++ raster engine could be of use in this case. It was perfect at loading up very large surfaces and could easily generate a profile given a series of coordinates. It was also possible to compare a given XYZ profile with one generated from my engine to determine if the two profiles were somehow similar. I also formulated a method whereby iterative coordinate translations and rotations were performed on the given profile until a closer match was found with the seabed profile. It took just over half an hour to run all permutations for a given profile.
Fig 1: On top is the seabed profile generated from the raster engine (over a nice piece of coral) and below the absolute vertical component residuals from the comparison with the input profile from seismic data. A negative match is obvious.
It was at this point that I knew that if we were to process this across the entire seismic dataset then we would need to lease time on the dedicated batch processing compute cluster.
Where I worked we had 2 dedicated compute clusters and one was based on Red Hat Enterprise Linux (RHEL) (I'm not so sure what the oher ran but I think it is also now RHEL-based). I used RH (before they went enterprise) when I was at university so it wasn't too dificult to move to KDE from the Ubuntu GNOME environment. We recompiled the raster engine for RHEL and scheduled the jobs for processing on the cluster.
There's no flashy graphics in this post - but almost over-night we managed to process all of the seismic lines and determine at least which ones had dodgy positioning. It turned out that the translation / rotation matching algorithm was only mildly successful (I'm sure given time I could improve it) - we had less than 20% success finding matches using the approach adopted but in most cases the determined match made sense.
- C++ / Java on Eclipse with K Develop IDEs
- NET programming on Mono
- MySQL database.PHP web programming
- ... and of course the mighty GRASS GIS (which is one of my favourite serious GISs).
I've always been the type of person who takes work home - and I usually work a few hours in the evenings and on the weekend just tidying things up. I guess it's OK when you're excited about what you're doing and / or you don't have a baby to look after (as my wife and I do now :) ). I don't feel right if I have not laboured over an interesting problem until I am completely knackered or the accompanying glass/es of wine haven't made me give up.
It was during this time that for fun I was writing my own raster engine in C++ - the idea was that I would get used to developing for Linux and also dust the cobwebs off my C++ skills / remember how to use pointers! The engine could read in and out ASCII XYZ / CSVs, ESRI ASCII and ZYCOR ASCII formats as well as address cells by coordinate etc - the usual stuff you would need from a basic raster manipulation library. I also hoped that by targeting Linux I could later use this engine for batch or parallel processing on a cluster of 64bit machines - massively increasing the amount of memory I could address and therefore the size of rasters I could manipulate. I compiled on my Ubuntu OS using g++ (GCC) and during performance testing the engine easily beat the ESRI engine and even the GRASS engine (by a factor exceeding 100).
Fortunately a work problem presented itself for which my newly created raster engine would be perfectly suited...
The geophysicist in our group had noticed that some old seismic data was not correctly positioned (grossly). Throughout the datasets he was looking at it seemed that a number of lines did not match when he compared the seabed picks with real bathymetric data (where we were working at the time this was a fairly reliable method of checking because the seabed had a very distinct profile due to the presence of corals). He asked me if there was an automated way to perform this comparison and if it were possible to even determine what the correct translation and rotation of the seismic data might be.
I quickly realised that my C++ raster engine could be of use in this case. It was perfect at loading up very large surfaces and could easily generate a profile given a series of coordinates. It was also possible to compare a given XYZ profile with one generated from my engine to determine if the two profiles were somehow similar. I also formulated a method whereby iterative coordinate translations and rotations were performed on the given profile until a closer match was found with the seabed profile. It took just over half an hour to run all permutations for a given profile.
Fig 1: On top is the seabed profile generated from the raster engine (over a nice piece of coral) and below the absolute vertical component residuals from the comparison with the input profile from seismic data. A negative match is obvious.
It was at this point that I knew that if we were to process this across the entire seismic dataset then we would need to lease time on the dedicated batch processing compute cluster.
Where I worked we had 2 dedicated compute clusters and one was based on Red Hat Enterprise Linux (RHEL) (I'm not so sure what the oher ran but I think it is also now RHEL-based). I used RH (before they went enterprise) when I was at university so it wasn't too dificult to move to KDE from the Ubuntu GNOME environment. We recompiled the raster engine for RHEL and scheduled the jobs for processing on the cluster.
There's no flashy graphics in this post - but almost over-night we managed to process all of the seismic lines and determine at least which ones had dodgy positioning. It turned out that the translation / rotation matching algorithm was only mildly successful (I'm sure given time I could improve it) - we had less than 20% success finding matches using the approach adopted but in most cases the determined match made sense.
Tuesday, July 1, 2008
Bi-Directional Multicast UDP Test Utility (BiDir MPing)
In a previous post (last month) I introduced the System Monitoring Framework that my old team developed and used for monitoring our complex systems environment. One of the systems that we looked after used bi-directional multicast UDP (User Datagram Protocol) - which allowed the system to communicate in a "battle-net" like topology.
Multicast UDP is a great routing protocol and in my opinion is under-utilised by many developers. It is often used in live video streaming where instead of sending a packet of data to each of n clients (i.e. n packets) a single packet is sent to a multicast address which is then distributed at the last possible routing node to the individual clients (therefore limiting the total number of packets routed across the network). See Wikipedia for more info.
Most applications of multicasting (sometimes referred to as multiplexed broadcast) only require packets to be routed from one node to many other nodes (i.e. a 1 to many relationship as in the video streaming example above). In our case however the system used bi-directional multicasting (i.e. a many to many relationship - hence the "battle net" analogy).
Fig 1: Normal UDP. 1 packet is sent to 3 clients resulting in 6 packet sends in total.
Fig 2: Multicast UDP. 1 packet is sent to 3 clients resulting in 4 packet sends in total.
Fig 3: Bi Directional Multicast UDP: Example shows all nodes particpating via the multicast address at the router.
If you want to test multicast UDP there are a number of "ping"-like command utilities available such as mping from Microsoft Research. These however do not test bi-directionality and so, yet again I built my own test utility. To my knowledge this is the ONLY utility available that will directly test bi-directionality of multicast UDP.
The test utility uses a controller which instructs nodes participating in the test to "ping" each other via the multicast address, listen for pings from other nodes and then report back. The results are compiled and analysed by the controller and displayed on screen and/or written to a database for historical reporting.
With the complex network routings we had in our particular system we often found that the programmed settings of routers would change as the network guys fiddled things around or secondary routers took over from primary paths. This meant that we occasionally saw the multicast routes fail to allow bi-directionality and hence the need for the test utility and the ability to report out to a database. The database reporting capability meant that we could also display the results in our System Monitoring framework (see previous post)
Here's the documentation for the utility in case anyone is interested.
Multicast UDP is a great routing protocol and in my opinion is under-utilised by many developers. It is often used in live video streaming where instead of sending a packet of data to each of n clients (i.e. n packets) a single packet is sent to a multicast address which is then distributed at the last possible routing node to the individual clients (therefore limiting the total number of packets routed across the network). See Wikipedia for more info.
Most applications of multicasting (sometimes referred to as multiplexed broadcast) only require packets to be routed from one node to many other nodes (i.e. a 1 to many relationship as in the video streaming example above). In our case however the system used bi-directional multicasting (i.e. a many to many relationship - hence the "battle net" analogy).
Fig 1: Normal UDP. 1 packet is sent to 3 clients resulting in 6 packet sends in total.
Fig 2: Multicast UDP. 1 packet is sent to 3 clients resulting in 4 packet sends in total.
Fig 3: Bi Directional Multicast UDP: Example shows all nodes particpating via the multicast address at the router.
If you want to test multicast UDP there are a number of "ping"-like command utilities available such as mping from Microsoft Research. These however do not test bi-directionality and so, yet again I built my own test utility. To my knowledge this is the ONLY utility available that will directly test bi-directionality of multicast UDP.
The test utility uses a controller which instructs nodes participating in the test to "ping" each other via the multicast address, listen for pings from other nodes and then report back. The results are compiled and analysed by the controller and displayed on screen and/or written to a database for historical reporting.
With the complex network routings we had in our particular system we often found that the programmed settings of routers would change as the network guys fiddled things around or secondary routers took over from primary paths. This meant that we occasionally saw the multicast routes fail to allow bi-directionality and hence the need for the test utility and the ability to report out to a database. The database reporting capability meant that we could also display the results in our System Monitoring framework (see previous post)
Here's the documentation for the utility in case anyone is interested.
Monday, June 2, 2008
SysMon: System Monitoring Framework
My old team used to be responsible for looking after a fairly complex IT system that supported a wide range of activities in the company... everything from surveys, logistics, marine, drilling operations, emergency response, pipeline operations and maintenance to things like planning and permitting. The low-res screen capture above may give you an idea of the physical and logical complexity.
The system comprised of a large number of Windows, Unix and Linux servers communicating over wired LAN, microwave WAN (for remote sites), GSM/GPRS cellular networks, the internet, satellite links and radio telemetry. In some cases it also made use of unusual protocol stacks such as bi-directional multicasting (for "battle-net" like technology) and video streaming. The production environment was also replicated to varying extents to development / testing / emergency response fall-back (ER) and disaster recovery (DR) modes.
I won't bore you with the details but the system integrated spatial and non-spatial data, weather station sensors, vehicle, vessel and helicopter tracking, planning, real-time positioning, real-time subsidence monitoring, GPS reference stations etc etc etc... and so had quite a large real-time or time critical component.
It was necessary therefore that we had some means of periodically monitoring performance and up-time of the systems and services. We looked at various solutions including Nagios / Big Brother etc - but in the end we built our own very extensible and simple solution using VB6 because we had some specific logical tests that we wanted to run.
The great thing about the solution we came up with was that it was extremely flexible and extensible. It allowed custom tests (in the form of scripts and plugins) to be run even though most of the tests could be handled straight "out of the box". You could even aggregate a number of sub tests into overview tests and perform logical query tests on databases.
Guys in the team could be notified of failures via email or SMS allowing them to respond rapidly to problems and at the end of every month we could produce a graphical report for our clients showing system up-time and performance.
I've included the user-guides here so you get the idea:
User Guide Object Model Service Engines
One day I'll get around to an open-sourced .NET version! - Stay tuned.
The system comprised of a large number of Windows, Unix and Linux servers communicating over wired LAN, microwave WAN (for remote sites), GSM/GPRS cellular networks, the internet, satellite links and radio telemetry. In some cases it also made use of unusual protocol stacks such as bi-directional multicasting (for "battle-net" like technology) and video streaming. The production environment was also replicated to varying extents to development / testing / emergency response fall-back (ER) and disaster recovery (DR) modes.
I won't bore you with the details but the system integrated spatial and non-spatial data, weather station sensors, vehicle, vessel and helicopter tracking, planning, real-time positioning, real-time subsidence monitoring, GPS reference stations etc etc etc... and so had quite a large real-time or time critical component.
It was necessary therefore that we had some means of periodically monitoring performance and up-time of the systems and services. We looked at various solutions including Nagios / Big Brother etc - but in the end we built our own very extensible and simple solution using VB6 because we had some specific logical tests that we wanted to run.
The great thing about the solution we came up with was that it was extremely flexible and extensible. It allowed custom tests (in the form of scripts and plugins) to be run even though most of the tests could be handled straight "out of the box". You could even aggregate a number of sub tests into overview tests and perform logical query tests on databases.
Guys in the team could be notified of failures via email or SMS allowing them to respond rapidly to problems and at the end of every month we could produce a graphical report for our clients showing system up-time and performance.
I've included the user-guides here so you get the idea:
User Guide Object Model Service Engines
One day I'll get around to an open-sourced .NET version! - Stay tuned.
Saturday, May 3, 2008
Advection & Dispersion Modeling of Oil Spills
Some time ago I wrote an Oil Spill Advection modeling system for the company I was working for at the time. The system used a hindcast database of wind & current information (direction and speed) to predict the short-term and historic movement of oil spill incidents. It was used for emergency response purposes but was limited to only the advection (movement) component of the spill. NB: I will add a reference to this software in this blog once I dig up some old screen captures. The software ran under an ESRI GIS environment.
After one paticular emergency simulation exercise I decided that in some cases the advection component was just not good enough and we would need to add another dimension to the model in the form of the dispersion of hydrocarbons at the sea surface. I had attended an IMO Oil Spill Management course some time before and understood the basic inputs that govern dispersion and so I spent a couple of days (literally) working on this problem. I wanted to keep things simple and so I limited the dispersion modeling to the following inputs:
A Pseudo-Lagrangian Dispersion Model was used with a "random step" method:
Fig 2: Model output: Hydrocarbon density at a given time-step in the model. Certain details have been edited out to protect the location of the modeling exercise.
After one paticular emergency simulation exercise I decided that in some cases the advection component was just not good enough and we would need to add another dimension to the model in the form of the dispersion of hydrocarbons at the sea surface. I had attended an IMO Oil Spill Management course some time before and understood the basic inputs that govern dispersion and so I spent a couple of days (literally) working on this problem. I wanted to keep things simple and so I limited the dispersion modeling to the following inputs:
- The volume of hydrocarbon spilled.
- The period for which that hydrocarbon is spilled (volume / period = release rate).
- A classification of the type of oil / hydrocarbon spilled. This was based on a profile of the specific gravity or API (American Petroleum Institute) gravity of the hydrocarbon.
- Based on the above classification a linear "% degredation of persistent oil volume per hour" is determined based on a pre-determined degredation curve which in turn is based on various components (i.e. evaporation, dissolution, weathering and bio-degredation).
- Initial Spreading Coefficient (S) (classically derived from 3 interfacial surface tension components which were not available to me when I wrote the prediction model). Various literature indicates a valid range of 0.05 -> 0.2. Both extreme value ranges are therefore modeled by the software.
- Density derived from specific gravity.
- Viscosity (measured).
- Minimum thickness of oil slick (derived from the hydrocarbon type classification - various literature exists).
- Maximum spread radius for a given volume is calculated using the spreading coefficient, density, viscosity and the minimum thickness of a slick.
- Positional randomisation factor for each model step (instantaneous maximum randomisation at each step to account for sea-state and other factors).
- Propagated positional error (over time) = [time lapse] / n (meters).
A Pseudo-Lagrangian Dispersion Model was used with a "random step" method:
- Particle modeling method modified for limited input information.
- Uses base advection / spill trajectory for “movement” as a result of metocean effects.
- Models “spreading”.
- Models “degradation” as a result of evaporation, dissolution and weathering (derived from spill “fate” curves).
- Emulsification effects are indirectly considered in this model (via the spill "fate" curve).
Fig 2: Model output: Hydrocarbon density at a given time-step in the model. Certain details have been edited out to protect the location of the modeling exercise.
Friday, April 4, 2008
RSSToolkit: Environment for creating, aggregating & crawling content for RSS
RSS = "Rich Site Summary" or "Really Simple Syndication" - it's an XML (Extensible Markup Language) based short summary of any type of information and is usually hosted for people to access via the internet. Most mobile phones can read RSS and there are many desktop widgets that use RSS to feed information to them. News, weather and website updates are common types of information that an RSS feed cam contain (GeoRSS is a standard that adds location to an RSS feed). You can even subscribe to this blog via RSS!
Anyway - it was at the time our team was building a vision of what we thought our next-generation information systems should look like that we realised that RSS was going to be an important element. We felt that with desktop widgets and smart phones becoming ever more popular that we could dessiminate information to an even wider client-base if we summarised the info we were collecting in the company and very specifically targeted it at different user groups via RSS. In this way our users would get ONLY the information that was of interest to them in a very succint manner and didn't have to pore through complex, often difficult to use web applications... even better they could get it on the device of their choice!
Field engineers for example could subscribe to feeds that gave them information on the status and operating envelopes of their assets and facilities directly to their mobile phones. No need to go the office, log on, start up an app, select an asset, select the parameters, interpret results... its all there summarised for them ready to go. All they need to concentrate on is doing their job.
You can see that the possibilities are almost endless and we got quite excited about this idea. We realised however that we would eventually be running hundreds of feeds and supporting the services and logic behind them. What we needed was some sort of toolkit that would allow us to easily interrogate databases, websites other feeds etc and aggregate the results into RSS feeds. What we came up with was the "RSS Toolkit". It consisted of:
Screen capture of the IDE showing help documentation and code snippets to the left, VB Script code in the center, Project explorer and properties to the right and debug output at the bottom. The floating window is the command-line daemon.
Here is a copy of the COM Library Object Model
Anyway - it was at the time our team was building a vision of what we thought our next-generation information systems should look like that we realised that RSS was going to be an important element. We felt that with desktop widgets and smart phones becoming ever more popular that we could dessiminate information to an even wider client-base if we summarised the info we were collecting in the company and very specifically targeted it at different user groups via RSS. In this way our users would get ONLY the information that was of interest to them in a very succint manner and didn't have to pore through complex, often difficult to use web applications... even better they could get it on the device of their choice!
Field engineers for example could subscribe to feeds that gave them information on the status and operating envelopes of their assets and facilities directly to their mobile phones. No need to go the office, log on, start up an app, select an asset, select the parameters, interpret results... its all there summarised for them ready to go. All they need to concentrate on is doing their job.
You can see that the possibilities are almost endless and we got quite excited about this idea. We realised however that we would eventually be running hundreds of feeds and supporting the services and logic behind them. What we needed was some sort of toolkit that would allow us to easily interrogate databases, websites other feeds etc and aggregate the results into RSS feeds. What we came up with was the "RSS Toolkit". It consisted of:
- A COM library of objects and functions that handled database query, web crawling, searching and feed aggregation.
- An IDE (Integrated Development Environment) that allowed us to build small scripts on top of the COM library (to add business intelligence).
- A daemon that scheduled and ran the scripts that we built.
Screen capture of the IDE showing help documentation and code snippets to the left, VB Script code in the center, Project explorer and properties to the right and debug output at the bottom. The floating window is the command-line daemon.
Here is a copy of the COM Library Object Model
Subscribe to:
Posts (Atom)