Data validation in the Railways

6 March 2013

The Genesis

Verifying railways systems covers many aspects and requires a large number of cross-verifications, performed by a wide range of actors including the designer of the system, the company in charge of its exploitation, the certification body, etc. Even if complete automation is not possible, any automatic verification is welcome as it helps to improve the overall level of confidence.

Indeed a railways system is a collection of highly dependent sub-system specification and these dependencies need to be checked. They may be based on railways signalling rules (that are specific to every country or even every company in a single country), on rolling stock features (constant or variable train size or configuration) and exploitation conditions.

In France, AQL RATP laboratory initiated the development of a generic tool to verify trackside data for the metro line 1 in Paris that is being automated. This tool, based on the PredicateB predicate evaluator, is able to parse data (XML, csv or text-based formats), load rules and verify that data complies with rules. Initially tested on line 13 configuration data, the tool has been able to check 400 definitions and 125 rules in 5 minutes.

During the DEPLOY project, the University of Düsseldorf and Siemens Transportation Systems have elaborated a new approach, based on the ProB model checker to dramatically reduce validation duration from about six months to some minutes. Data is extracted from ADA source code and properties come from B models. In the case of the San Juan project, 79 files with a total of 23,000 lines of B are parsed to extract 226 properties and 147 assertions. The verification took 1017 seconds and led to the discovery of 4 false formulas. ProB was then experimented with great success on several projects: Roissy Charles de Gaule airport shuttle, Barcelona line 9, San Paulo line 4, Paris line 1 and Algiers line 1. At that occasion, ProB was slightly improved in order to deal with large scale problems and well validated in order to ease its acceptance by a certification body.

Data Table Validation Tool

Alstom Transport Information Solutions decided to experiment a new approach by reusing successful features of previous experiments.

A new tool, DTVT, is defined and implemented. Its structure is presented below.

Input data is in csv format. Data items are identified through their container file and their name.

Supported basic types are INT, BOOL and STRING. Data items are sequences of these basic types. Values are extracted from xls files.

The verification rules are expressed using the B mathematical language and structured as B operations. Instead of having to deal with too large, quantified predicates, a verification rule is decomposed in small steps that allow displaying accurate error message helping to determine the source of the error. A rule is composed of one or several COUNTEREXAMPLEs. COUNTEREXAMPLEs are evaluated in the order they are defined. Keyword COUNTEREXAMPLE is followed by a formatted message (%1, %2, %3, etc. represent the value of the first, second, third parameter of the following ANY substitution). The ANY substitution allows to filter data or to calculate values. The ANY substitution is followed by an EXPECTED field. If some values of the parameters of the ANY substitution satisfy the predicate of that substitution but don’t satisfy the predicate of the EXPECTED field, the error message is displayed with its parameters instantiated.

ProB is the central tool for the verification. It has been modified in order to produce a file containing all counter examples detected (see figure 5) and slightly improved to better support some B keywords. DTVT has been experimented with success on several ongoing developments (Mexico, Toronto, Sao Paulo, and Panama) to verify up to 100,000 Excel cells against up to 500 rules. A first round allowed defining required concepts, intermediate constructs (predicates used by several rules) and formalizing a set of generic rules that are shared by all projects. During the next rounds, specific project rules and data files were added. A complete verification is performed in about 10 minutes, including the verification report. The process is completely automatic and can be replayed without any human intervention when data values are modified.


  • Leuschel, Michael and Falampin, Jérôme and Fabian, Fritz and Daniel, Plagge (2009), Automated Property Verification for Large Scale B Models. In: Proceedings FM 2009. Springer-Verlag.
  • Michael Leuschel (2012), Formal Mind, ProB, ProR and Data Validation with B, FM’2012, Industray Day.
  • Thierry Lecomte, Lilian Burdy, Michael Leuschel (2012), Formally Checking Large Data Sets in the Railways. In Proceedings of DS-Event-B 2012: Workshop on the experience of and advances in developing dependable systems in Event-B, in conjunction with ICFEM 2012 – Kyoto, Japan, November 13, 2012