Are all data useful? Inferring causality to predict flows across sewer and drainage systems using directed information and boosted regression trees.

Water Res

Department of Civil and Environmental Engineering, University of Michigan, Ann Arbor, United States. Electronic address:

Published: November 2018

As more sensor data become available across urban water systems, it is often unclear which of these new measurements are actually useful and how they can be efficiently ingested to improve predictions. We present a data-driven approach for modeling and predicting flows across combined sewer and drainage systems, which fuses sensor measurements with output of a large numerical simulation model. Rather than adjusting the structure and parameters of the numerical model, as is commonly done when new data become available, our approach instead learns causal relationships between the numerically-modeled outputs, distributed rainfall measurements, and measured flows. By treating an existing numerical model - even one that may be outdated - as just another data stream, we illustrate how to automatically select and combine features that best explain flows for any given location. This allows for new sensor measurements to be rapidly fused with existing knowledge of the system without requiring recalibration of the underlying physics. Our approach, based on Directed Information (DI) and Boosted Regression Trees (BRT), is evaluated by fusing measurements across nearly 30 rain gages, 15 flow locations, and the outputs of a numerical sewer model in the city of Detroit, Michigan: one of the largest combined sewer systems in the world. The results illustrate that the Boosted Regression Trees provide skillful predictions of flow, especially when compared to an existing numerical model. The innovation of this paper is the use of the Directed Information step, which selects only those inputs that are causal with measurements at locations of interest. Better predictions are achieved when the Directed Information step is used because it reduces overfitting during the training phase of the predictive algorithm. In the age of "big water data", this finding highlights the importance of screening all available data sources before using them as inputs to data-driven models, since more may not always be better. We discuss the generalizability of the case study and the requirements of transferring the approach to other systems.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.watres.2018.09.009DOI Listing

Publication Analysis

Top Keywords

boosted regression
12
regression trees
12
numerical model
12
sewer drainage
8
drainage systems
8
directed boosted
8
combined sewer
8
sensor measurements
8
existing numerical
8
directed step
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!