
The system comprised of a large number of Windows, Unix and Linux servers communicating over wired LAN, microwave WAN (for remote sites), GSM/GPRS cellular networks, the internet, satellite links and radio telemetry. In some cases it also made use of unusual protocol stacks such as bi-directional multicasting (for "battle-net" like technology) and video streaming. The production environment was also replicated to varying extents to development / testing / emergency response fall-back (ER) and disaster recovery (DR) modes.
I won't bore you with the details but the system integrated spatial and non-spatial data, weather station sensors, vehicle, vessel and helicopter tracking, planning, real-time positioning, real-time subsidence monitoring, GPS reference stations etc etc etc... and so had quite a large real-time or time critical component.
It was necessary therefore that we had some means of periodically monitoring performance and up-time of the systems and services. We looked at various solutions including Nagios / Big Brother etc - but in the end we built our own very extensible and simple solution using VB6 because we had some specific logical tests that we wanted to run.
The great thing about the solution we came up with was that it was extremely flexible and extensible. It allowed custom tests (in the form of scripts and plugins) to be run even though most of the tests could be handled straight "out of the box". You could even aggregate a number of sub tests into overview tests and perform logical query tests on databases.




User Guide Object Model Service Engines
One day I'll get around to an open-sourced .NET version! - Stay tuned.