Statistical system is a complex system of data collection, data processing, statistical analyses, etc. The following figure (by Sundgren (2004)) shows a statistical system as precisely defined, man-designed system that measures external reality. It shows two main macro functions: “Planning and control system” and “Statistical production system”.
This is a general synthesized view of the statistical system and it could represent one survey or the whole statistical office or even an international organization. How such a system is built up and organized in real life varies greatly. Some implementations of statistical system have worked quite well so far and others not so well. Local environments of statistical systems are slightly different but big changes in environment are more and more global. It does not matter anymore how well the system has performed so far, some global changes in environment are so big that every system has to adapt and change (del 3.2). Independently from any specific system, what it show is a strong interaction, or hysteresis, of the systems with the real word and a system overlapping between the two main macro functions for accounting the request from the real world.
In the context of the Ess-Net, we identify this system overlapping as the effective Data Warehouse (DW) in which we are able to store statistical information of several statistical domains for supporting any analysis for strategic NSI’s or European decisions related to statistics. This identifies a new possible approach to statistical production based on a DW architecture; we define this specific approach as Statistical-Data Warehouse (S-DWH).
In a S-DWH the main purpose is to integrate and store data generated as a result of an organization's activities from different production departments, with the aim of optimizing the supply chain or carry out marketing strategies.
“The stovepipe model is the outcome of a historic process in which statistics in individual domains have developed independently. It has a number of advantages: the production processes are best adapted to the corresponding products; it is flexible in that it can adapt quickly to relatively minor changes in the underlying phenomena that the data describe; it is under the control of the domain manager and it results in a low-risk business architecture, as a problem in one of the production processes should normally not affect the rest of the production.” (Terminology Relating To The Implementation Of The Vision On The Production Method Of Eu Statistics)
“However, the stovepipe model also has a number of disadvantages. First, it may impose an unnecessary burden on respondents when the collection of data is conducted in an uncoordinated manner and respondents are asked for the same information more than once. Second, the stovepipe model is not well adapted to collect data on phenomena that cover multiple dimensions, such as globalisation, sustainability or climate change. Last but not least, this way of production is inefficient and costly, as it does not make use of standardisation between areas and collaboration between Member States. Redundancies and duplication of work, be it in development, in production or in dissemination processes are unavoidable in the stovepipe model. These inefficiencies and costs for the production of national data are further amplified when it comes to collecting and integrating regional data, which are indispensible for the design, monitoring and evaluation of some EU policies.” (Terminology Relating To The Implementation Of The Vision On The Production Method Of Eu Statistics)
2Augmented stovepipe model
As indicated in the previous paragraph, the stovepipe model describes the pre-dominant situation within the ESS where statistics are produced in numerous parallel processes. The adjective "augmented" indicates that the same model is reproduced and added at Eurostat level.
In order to produce European statistics, Eurostat compiles the data coming from individual NSIs also area by area. The same stovepipe model thus exists in Eurostat, where the harmonised data in a particular statistical domain are aggregated to produce European statistics in that domain. The traditional approach for the production of European statistics based on the stovepipe model can thus be labelled as an "augmented" stovepipe model, in that the European level is added to the national level. (Terminology Relating To The Implementation Of The Vision On The Production Method Of Eu Statistics)
3The Data Warehouse approach
“Innovative way of producing statistics based on the combination of various data sources in order to streamline the production process. This integration is twofold:
horizontal integration across statistical domains at the level of National Statistical Institutes and Eurostat. Horizontal integration means that European statistics are no longer produced domain by domain and source by source but in an integrated fashion, combining the individual characteristics of different domains/sources in the process of compiling statistics at an early stage, for example households or business surveys.
vertical integration covering both the national and EU levels. Vertical integration should be understood as the smooth and synchronized operation of information flows at national and ESS levels, free of obstacles from the sources (respondents or administration) to the final product (data or metadata). Vertical integration consists of two elements: joint structures, tools and processes and the so-called European approach to statistics (see this entry).”
(Terminology Relating To The Implementation Of The Vision On The Production Method Of Eu
“The present "augmented" stovepipe model, has a certain number of disadvantages (burden on respondents, not suitable for surveying multi-dimensional phenomena, inefficiencies and high costs). By integrating data sets and combining data from different sources (including administrative sources) the various disadvantages of the stovepipe model could be avoided. This new approach would improve efficiency by elimination of unnecessary variation and duplication of work and create free capacities for upcoming information needs.”
“However, this will require an investigation into how information from different sources can be merged and exploited for different purposes, for instance by eliminating methodological differences or by making statistical classifications uniform.” (Terminology Relating To The Implementation Of The Vision On The Production Method Of Eu Statistics).
„To go from a conceptually integrated system such as the SNA to a practically integrated system is a long term project and will demand integration in the production of primary statistics. This is the priority objective that Eurostat has given to the European Statistical System through its 2009 Communication to the European Parliament and the European Council on the production method of EU statistics ("a vision for the new decade").“ (Guidlines on Integrated Economic Statistics - Eurostat answer)
3.2Data Warehouse model
The main purpose of a data warehouse is to integrate and store data generated as a result of an organization's activities. A data warehouse system is a whole or one of several components of the production infrastructure and, using the data coming from different production departments, is generally used to optimize the supply chain or carry out marketing.
From a statistical production point of view, in addition to the stovepipe model, augmented stovepipe model and integration model, W. Radermacher, A. Baigorri, D. Delcambre, W. Kloek, H. Linden (2009) describe also the warehouse approach, defined as: “The warehouse approach provides the means to store data once, but use it for multiple purposes. A data warehouse treats information as a reusable asset. Its underlying data model is not specific to a particular reporting or analytic requirement. Instead of focusing on a process-oriented design, the underlying repository design is modelled based on data inter-relationships that are fundamental to the organisation across processes.”
Conceptual model of data warehousing in the ESS (European Statistical System)
(W. Radermacher, A. Baigorri, D. Delcambre, W. Kloek, H. Linden (2009))
“Based on this approach statistics for specific domains should not be produced independently from each other, but as integrated parts of comprehensive production systems, called data warehouses. A data warehouse can be defined as a central repository (or "storehouse") for data collected via various channels.” (W. Radermacher, A. Baigorri, D. Delcambre, W. Kloek, H. Linden (2009)).
In the future of the document, a statistical production system model combining the integrated model and the warehouse approach will be defined as a Statistical Data Warehouse (S-DWH).
In NSIs, where statistical production processes of different topics are produced following stove-pipe-like production lines, i.e. independent statistical production processes, the output system is generally used to collect final aggregate-data. When several statistical production are inside a common S-DWH, different aggregate data on different topics should not be produced independently from each other but as integrated parts of a comprehensive information system. In this case statistical concepts and infrastructures are shared, and the data in a common statistical domain are stored once for multiple purposes.
In all these cases, the S-DWH is the central part of the whole IT infrastructure for supporting statistical production and corresponds to a system able to manage all phases of a statistical production process.
In the following we will describe a generic S-DWH as: a central statistical data store, regardless of the data’s source, for managing all available data of interest, improving the NSI’s capability to:
(re)use data to create new data/new outputs;
produce the necessary information.
This relates to a central repository able to manage several kind of data ( micro, macro and meta) in order to support cross-domain production processes, fully integrated in terms of data, metadata, process
e and instruments, and also supporting the definition of new statistical strategy, for new statistical designs or updates.