parsecprojects

Tuesday, July 1, 2008

Bi-Directional Multicast UDP Test Utility (BiDir MPing)

In a previous post (last month) I introduced the System Monitoring Framework that my old team developed and used for monitoring our complex systems environment. One of the systems that we looked after used bi-directional multicast UDP (User Datagram Protocol) - which allowed the system to communicate in a "battle-net" like topology.

Multicast UDP is a great routing protocol and in my opinion is under-utilised by many developers. It is often used in live video streaming where instead of sending a packet of data to each of n clients (i.e. n packets) a single packet is sent to a multicast address which is then distributed at the last possible routing node to the individual clients (therefore limiting the total number of packets routed across the network). See Wikipedia for more info.

Most applications of multicasting (sometimes referred to as multiplexed broadcast) only require packets to be routed from one node to many other nodes (i.e. a 1 to many relationship as in the video streaming example above). In our case however the system used bi-directional multicasting (i.e. a many to many relationship - hence the "battle net" analogy).

Fig 1: Normal UDP. 1 packet is sent to 3 clients resulting in 6 packet sends in total.

Fig 2: Multicast UDP. 1 packet is sent to 3 clients resulting in 4 packet sends in total.

Fig 3: Bi Directional Multicast UDP: Example shows all nodes particpating via the multicast address at the router.

If you want to test multicast UDP there are a number of "ping"-like command utilities available such as mping from Microsoft Research. These however do not test bi-directionality and so, yet again I built my own test utility. To my knowledge this is the ONLY utility available that will directly test bi-directionality of multicast UDP.

The test utility uses a controller which instructs nodes participating in the test to "ping" each other via the multicast address, listen for pings from other nodes and then report back. The results are compiled and analysed by the controller and displayed on screen and/or written to a database for historical reporting.

With the complex network routings we had in our particular system we often found that the programmed settings of routers would change as the network guys fiddled things around or secondary routers took over from primary paths. This meant that we occasionally saw the multicast routes fail to allow bi-directionality and hence the need for the test utility and the ability to report out to a database. The database reporting capability meant that we could also display the results in our System Monitoring framework (see previous post)

Here's the documentation for the utility in case anyone is interested.

Monday, June 2, 2008

SysMon: System Monitoring Framework

My old team used to be responsible for looking after a fairly complex IT system that supported a wide range of activities in the company... everything from surveys, logistics, marine, drilling operations, emergency response, pipeline operations and maintenance to things like planning and permitting. The low-res screen capture above may give you an idea of the physical and logical complexity.

The system comprised of a large number of Windows, Unix and Linux servers communicating over wired LAN, microwave WAN (for remote sites), GSM/GPRS cellular networks, the internet, satellite links and radio telemetry. In some cases it also made use of unusual protocol stacks such as bi-directional multicasting (for "battle-net" like technology) and video streaming. The production environment was also replicated to varying extents to development / testing / emergency response fall-back (ER) and disaster recovery (DR) modes.

I won't bore you with the details but the system integrated spatial and non-spatial data, weather station sensors, vehicle, vessel and helicopter tracking, planning, real-time positioning, real-time subsidence monitoring, GPS reference stations etc etc etc... and so had quite a large real-time or time critical component.

It was necessary therefore that we had some means of periodically monitoring performance and up-time of the systems and services. We looked at various solutions including Nagios / Big Brother etc - but in the end we built our own very extensible and simple solution using VB6 because we had some specific logical tests that we wanted to run.

The great thing about the solution we came up with was that it was extremely flexible and extensible. It allowed custom tests (in the form of scripts and plugins) to be run even though most of the tests could be handled straight "out of the box". You could even aggregate a number of sub tests into overview tests and perform logical query tests on databases.

Guys in the team could be notified of failures via email or SMS allowing them to respond rapidly to problems and at the end of every month we could produce a graphical report for our clients showing system up-time and performance.

I've included the user-guides here so you get the idea:
User Guide Object Model Service Engines

One day I'll get around to an open-sourced .NET version! - Stay tuned.

Saturday, May 3, 2008

Advection & Dispersion Modeling of Oil Spills

Some time ago I wrote an Oil Spill Advection modeling system for the company I was working for at the time. The system used a hindcast database of wind & current information (direction and speed) to predict the short-term and historic movement of oil spill incidents. It was used for emergency response purposes but was limited to only the advection (movement) component of the spill. NB: I will add a reference to this software in this blog once I dig up some old screen captures. The software ran under an ESRI GIS environment.

After one paticular emergency simulation exercise I decided that in some cases the advection component was just not good enough and we would need to add another dimension to the model in the form of the dispersion of hydrocarbons at the sea surface. I had attended an IMO Oil Spill Management course some time before and understood the basic inputs that govern dispersion and so I spent a couple of days (literally) working on this problem. I wanted to keep things simple and so I limited the dispersion modeling to the following inputs:

The volume of hydrocarbon spilled.
The period for which that hydrocarbon is spilled (volume / period = release rate).
A classification of the type of oil / hydrocarbon spilled. This was based on a profile of the specific gravity or API (American Petroleum Institute) gravity of the hydrocarbon.
Based on the above classification a linear "% degredation of persistent oil volume per hour" is determined based on a pre-determined degredation curve which in turn is based on various components (i.e. evaporation, dissolution, weathering and bio-degredation).
Initial Spreading Coefficient (S) (classically derived from 3 interfacial surface tension components which were not available to me when I wrote the prediction model). Various literature indicates a valid range of 0.05 -> 0.2. Both extreme value ranges are therefore modeled by the software.
Density derived from specific gravity.
Viscosity (measured).
Minimum thickness of oil slick (derived from the hydrocarbon type classification - various literature exists).
Maximum spread radius for a given volume is calculated using the spreading coefficient, density, viscosity and the minimum thickness of a slick.
Positional randomisation factor for each model step (instantaneous maximum randomisation at each step to account for sea-state and other factors).
Propagated positional error (over time) = [time lapse] / n (meters).

Fig 1: Oil Spill Fates (Persistence) for different oil types over time. The graphic has been modified from it's original to show only generic hydrocarbon types.

A Pseudo-Lagrangian Dispersion Model was used with a "random step" method:

Particle modeling method modified for limited input information.
Uses base advection / spill trajectory for “movement” as a result of metocean effects.
Models “spreading”.
Models “degradation” as a result of evaporation, dissolution and weathering (derived from spill “fate” curves).
Emulsification effects are indirectly considered in this model (via the spill "fate" curve).

The screen capture below shows the model output in terms of surface hydrocarbon density at a particular time-instance in the model:

Fig 2: Model output: Hydrocarbon density at a given time-step in the model. Certain details have been edited out to protect the location of the modeling exercise.

Friday, April 4, 2008

RSSToolkit: Environment for creating, aggregating & crawling content for RSS

RSS = "Rich Site Summary" or "Really Simple Syndication" - it's an XML (Extensible Markup Language) based short summary of any type of information and is usually hosted for people to access via the internet. Most mobile phones can read RSS and there are many desktop widgets that use RSS to feed information to them. News, weather and website updates are common types of information that an RSS feed cam contain (GeoRSS is a standard that adds location to an RSS feed). You can even subscribe to this blog via RSS!

Anyway - it was at the time our team was building a vision of what we thought our next-generation information systems should look like that we realised that RSS was going to be an important element. We felt that with desktop widgets and smart phones becoming ever more popular that we could dessiminate information to an even wider client-base if we summarised the info we were collecting in the company and very specifically targeted it at different user groups via RSS. In this way our users would get ONLY the information that was of interest to them in a very succint manner and didn't have to pore through complex, often difficult to use web applications... even better they could get it on the device of their choice!

Field engineers for example could subscribe to feeds that gave them information on the status and operating envelopes of their assets and facilities directly to their mobile phones. No need to go the office, log on, start up an app, select an asset, select the parameters, interpret results... its all there summarised for them ready to go. All they need to concentrate on is doing their job.

You can see that the possibilities are almost endless and we got quite excited about this idea. We realised however that we would eventually be running hundreds of feeds and supporting the services and logic behind them. What we needed was some sort of toolkit that would allow us to easily interrogate databases, websites other feeds etc and aggregate the results into RSS feeds. What we came up with was the "RSS Toolkit". It consisted of:

A COM library of objects and functions that handled database query, web crawling, searching and feed aggregation.
An IDE (Integrated Development Environment) that allowed us to build small scripts on top of the COM library (to add business intelligence).
A daemon that scheduled and ran the scripts that we built.

The COM library was at the heart of the toolkit and the IDE used the microsoft scripting run time - so the main scripting languages supported were VB Script and J Script. You could however also write in any .NET language and also Python - although these were not supported by the IDE or the daemon.

Screen capture of the IDE showing help documentation and code snippets to the left, VB Script code in the center, Project explorer and properties to the right and debug output at the bottom. The floating window is the command-line daemon.

Here is a copy of the COM Library Object Model

Monday, October 1, 2007

ROV Inspection of Pipelines - GIS Data Model

My philosophy has always been that where possible ALL spatial data should be managed in a spatial database management system. Where I used to work this was ESRI's ArcSDE on top of Oracle running on a UNIX or LINUX server.

Our approach to pipeline data was no different. We used an APDM inspired data model and also hooked into the pipeline engineering group's inspection and maintenance database - so we could also integrate intelligent pigging data. Our periodic ROV (Remotely Operated Vehicle - see above picture) surveys would churn out hundreds of kilomteres (= hours = terrabytes) of video data. Specialist (proprietary) software solutions were required to review the data. Commonly used systems are: VisualReview (from VisualSoft) and Starfix.DVS (from Fugro). Each of these systems stores and manages data differently. None of them are GIS based (or were at the time). So you can see that a certain point it is possible for a company to be running multiple systems for essentially the same type of data.

We solved this by deciding that the data should be delivered by the ROV survey contractor in a format that we could easily suck into the spatial database and then integrate with video. We started by allowing the spatial / time / KP tagged "event" data to be delivered in a simple ASCII CSV or UKOOA P5/94 form. The video data files were named in such a way that they indicated start of acquisition time for the video fragment. Cross profiles and current density profiles were similarly named and delivered in ASCII form.

We built a routine that loaded the video data onto NAS data storage and indexed the video files against KP and time in the database - meaning that for any location / time or KP we could find the appropriate video and go straight to the point within the video that corresponded with the location on the pipeline. Essentially we geo-coded the videos. There are now standards developed by the Sony Corporation that include geo-coded tags in video files H.264 (AVC HD / MPEG-4) and I may do something with these standards in the future (stay tuned).

The video index was extremely useful because it also meant that we could automatically run a query that would identify videos in which no major pipeline inspection events were observed and archive them off-line (hence saving storage space). Our full online store was up to 6Tb!

Now that the data was safely stored in our spatial database - it was important for us to provide a tool to our users that would allow them to review the ROV survey data using a GUI metaphor that they were familiar with. We chose the emulate the design of VisualReview / Starfix.DVS and built an ArcObjects application that sat on top of the data in ArcSDE / Video store. The resultant GUI can be seen in the screen capture below. It includes synchronised map display (ArcMap Component), video cameras, current density profile, vertical profile, observed events and cross profiles.

Saturday, July 15, 2006

Improving on CUBE for Bathymetric Modeling

In a previous post I briefly described a parallel processing approach to bathymetric estimation from soundings. In the post I mentioned that I believe I had improved on the CUBE (Combined Uncertainty Bathymetric Estimation) algorithm. Well it's not very fair to make a statement like that without at least trying to substantiate it - so here goes.

CUBE is really designed to model uncertainty and create bathymetric terrain models from multi-beam echo sounder (MBES) data. It is implemented in software products such as Fledermaus. It can perform estimation on multiple MBES datasets and can even (so I've been told recently - although I have not yet tested it myself) account for temporal uncertainty (i.e. a more recent dataset has less uncertainty than an older dataset). It is also possible to utilise other sources of data other than MBES as long as you pre-attribute the uncertainty values yourself.

The CUBE algorithm uses a sensor model to assign uncertainties in the vertical and horizontal domain to each sounding before estimation. For example the outer beams of the MBES swath have less certainty in both the horizontal and vertical domain due the the geometry of a a discrete beam MBES system.

Fig 1: Typical uncertainty in the vertical domain for MBES data.

What I wanted to achieve was a CUBE-like algorithm that was suitable for highly heterogeneous sounding data sets (i.e. a combination of single-beam echo sounder data, MBES data and even data picked from seismic acquired over multiple years / decades). I wanted the algorithm to output a bathymetric surface with even wide "holes" interpolated - therefore employing adaptive kernel estimation rather than discrete binning as with CUBE. The items below outline the improvements made to or similarities with the base CUBE algorithm:

On-The-Fly Outlier Detection:
It was important to me that the algorithm could be set up to perform pinnacle or "spike" / outlier filtering therefore invalidating points for inclusion before the kernel estimation takes place.

Non-Square Kernels:
CUBE uses a square kernel - I wanted to use a more natural adaptive circular kernel. A possible future improvement is to utilise adaptive elliptical kernels that are determined based on anistropic biases.

Managing Vertical / Horizontal and Temporal Uncertainties:
As with CUBE the algorithm requires the bathymetric soundings to be attributed as follows:
Easting, Northing, Depth, Time, TVU, THU.
TVU = Total Vertical Uncertainty, THU = Total Horizontal Uncertainty.

The TVU and THU values can be determined in a pre-processor using a sensor model and various other inputs (e.g. validity of velocity correction etc).

Adaptive Kernels Using Conditions:
The algorithm then requires a series of estimation "scenarios" to be defined. These are the parameters by which the adaptive kernel estimation is enabled. They basically describe fall-back options should the previous estimation condition not be met.

For example: If I have an initial condition that says I must make an estimation for a 10 x 10m area using a mimum of 10 soundings and I only have 7 - then my second scenario may relax the conditions:

Minimum of 5 soundings.... OR ....
Widen the search area for soundings from 10m radius to 15m radius.

Managing Anisotropic Bias:
Anisotropic bias is a bias that is introduced due to the directionality or specific spatial distribution of points. It is common in single-beam data but the effects are less in well acquired MBES data due to the more homogenous distribution of points. Anisotropic bias is also a problem at the interface between surveys of relatively high and low sounding density.

Desirable Case:

In this case all soundings are distributed evenly amongst four quadrants around the center of the grid cell. This forms the MOST desirable scenario for sounding distribution.

Least Desirable Case:

In this case all soundings are distributed only within a single quadrant about the center of the grid cell – this can yield a bathymetric solution that is not fully solved and represents the LEAST desirable scenario for sounding distribution.

Acceptable Case:

In this case although soundings are not distributed amongst all four quadrants (like the most desirable scenario), they DO provide a fully distributed pattern that allows a depth estimate to be confidently determined.

Not Confident Solution:

Although soundings can be found in TWO quadrants this scenario DOES NOT provide a distribution that allows a confident depth estimate at the center of the grid cell.

Unconfident Solution:

Although soundings can be found in THREE quadrants this scenario still provides a distribution that DOES NOT constitute a confident determination of depth at the center of the grid cell.

The difference between the maximum clock-wise angle measured from grid north to any sounding and the minimum angle measured in the same way does not equal or exceed 180 degrees and therefore this scenario is deemed to have a distribution that DOES NOT yield a confident depth estimate.

Confident Solution:

The distribution pattern in this scenario shows soundings that are distributed amongst three quadrants about the center of the grid cell. In this case however the depth estimate solution can be considered confident as the angular differences between all soundings from the center of the grid cell exceeds 180 degrees.

Extended Search For 180 Degree Rule:

In the case shown in the diagram above the initial sounding set did not yield a confident distribution of points based on the 180 degree rule. The search radius was extended and one more sounding was discovered which allowed the 180 degree rule to be met. All additional points in the extended search that DID NOT contribute to the 180 degree rule being achieved are IGNORED from the final sounding set.

Tuning of Applicability Ratios:
The applicability of each uncertainty (i.e. the weighting it has in the resultant estimation) can be tuned on a per-scenario basis - allowing the user to have great control over how the estimation result is derived:

The overall algorithm flow is defined below:

Tuesday, June 13, 2006

A Parallel Bathymetric Processing System

A Parallel, Utility-Based Computing Approach To Bathymetric Navigation Surface Creation From Heterogeneous & Distributed Bathymetric Sounding Databases.

I've often had to deal with truly large bathymetric sounding databases. With increasing use of multi-beam echo sounders (MBES) bathy databases are just getting larger and larger and often contain many years of data of varying qualities. What happens if you want to make use of all this data to create new bathymetric products? You need to account for the various uncertainties in quality, temporal variation, accuracy etc.

Modern processing algorithms such as CUBE (Combined Uncertainty Bathymetric Estimation) attempt to account for these types of uncertainties and can derive bathy products such as "safe navigation surfaces" from such heterogeneous data.

When these databases get very large - or indeed if they are distributed across multiple instances or sites it becomes very difficult to load and process all of the input data. This got me to thinking about using parallel processing techniques (HPC - High Performance Computing etc) to undertake and speed up such tasks. I reviewed many parallel processing topologies and came to the conclusion that for the occassional processing job a dedicated compute cluster would not be necessary. Why not just make use of the many thousands of computers that are essentially idle in the office when everyone goes home? Just like the SETI project or a loosely coupled Beowulf cluster (also known as a Utility cluster or a Network of Workstations [NOW]).

Well I got to fully designing a modified CUBE algorithm (I think its an improvement), data flows, object models, data models etc and I began writing the code for it all. After a lot of effort (and a 41 page proposal document) on my first test run I realised that the Control Database was going to be a bottleneck so I started a complete re-design which I will post at some stage in the future when its all developed a bit more.