This blog post was updated on 7/6/2019
The sheer amount of data generated by the global AIS network is truly overwhelming: more than half a billion positions are received daily from various sources around the world. But as this comes from a huge range of aggregated sources with varying degrees of quality, from dedicated receiver stations to maritime enthusiasts and AIS satellites, this raw data flow may contain errors, incomplete information and contradictions. Ships might be misidentified, jump backwards and forwards in time, or disappear altogether and re-appear.
To provide users with accurate and actionable data, MarineTraffic uses real-time filtering and analysis combined with post-processing. The chaotic raw data flow is filtered down to 30 million positions each day: one position per ship, per minute. Here are some of the techniques MarineTraffic uses to guarantee the accuracy of all published position data.
1. Remove all junk data
AIS information is transmitted every few seconds from each vessel by VHF radio signals which typically have an average terrestrial range of just 20-30 miles. When they are picked up by receivers, packets may be incomplete or not properly formatted. The first stage of filtering removes all data that is obviously corrupted, incomplete or doesn’t conform to the AIS standard.
2. Identify each vessel correctly
The AIS system transmits each vessel’s unique identifier such as the International Maritime Organisation (IMO) number, the Maritime Mobile Service Identity (MMSI) number and the vessel Call Sign.
As ships change names or operators, update their AIS information or are decommissioned, MarineTraffic is proactively updating and removing any outdated or erroneous records to maintain an accurate database of over 500,000 AIS-equipped vessels, each with valid supplementary information.
In September 2014, MarineTraffic introduced the Ship ID scheme, which aims to provide a much more authoritative and unique identifier for each sea-going vessel. This has gained traction with the industry as users are increasingly asking for the MarineTraffic Ship ID over other common identifiers which can be duplicated or manipulated. Increasingly, the quality of this data set is all part of efforts to ensure the correct identity and position allocation to each vessel.
3. Real-time spatial and temporal correlation analysis
MarineTraffic operates the largest AIS terrestrial receiving station network in the world. In addition, MarineTraffic has data sharing agreements with a number of third-parties in order to extend the reach. This variety of collection methods, coupled with inherent AIS protocol deficiencies, affects the quality of data received.
AIS transmissions from the best providers can reach MarineTraffic servers almost immediately, while more remote stations with slower connections to the global network take longer to be received. AIS satellites sometimes wait hours before their data can be offloaded to the next ground station in range.
Combined with the fact that AIS timestamps are synchronised locally rather than globally, the MarineTraffic system can be presented with wildly ranging spatial and temporal information about each vessel. The system weighs the quality of each data source and attempts to create a coherent timeline in order to establish a clear picture of where each vessel is at a given time. Each vessel’s movement is constantly evaluated and if its speed, course or any other factor falls outside accepted boundaries, the data is flagged and automatically or manually removed.
With each stage of evaluation, the rules get stricter and stricter. Post-processing is a series of functions carried out on a daily basis to catch any data that is not valid, yet still managed to pass through the real-time filters. By this stage, the system is focussed on special cases, of very small deviations not picked up by the real-time processing. These even more stringent tests make sure that no erroneous positions are present before the data is re-checked, re-classified and ultimately stored in the historical data set.
Over time, as these combined processes have proven their effectiveness and become even more sophisticated, MarineTraffic has been able to move from reporting accurate positions for each vessel every five minutes, to two minutes and now once per minute – achieving a more granular level of information, without sacrificing data quality. Taken together, this multi-stage process results in being able to produce increasingly accurate positions that are guaranteed to be true to the best of MarineTraffic ability.