Using Databases with Dispersion Modeling: Calculations With Queries


We have all seen a Rube Goldberg machine. If you don’t know what one is, Wikipedia describes it as “a machine intentionally designed to perform a simple task in an indirect and over-complicated fashion”. Do a Google search on the term. You will see some truly amazing efforts for such simple tasks.

I have been a dispersion modeler for 20+ years, and it seems to me that performing a dispersion modeling analysis, e.g. compiling the input data, processing the output data, and generating the AQA report, is very much like a Rube Goldberg.

Then I discovered the database and learned a few ways to help me process the output files. A few weeks later, I learned how I could construct parts of input files. After another week or two I could generate an entire set of input files for a project in a few steps.

Why go through all that effort when I had access to one of the commercial dispersion modeling interfaces? Was I building my own Rube Goldberg? At the time, some of the source options that I needed for a significant project were not available in the commercial interface. Those options would not be available for several months and I had to be finished with my analysis before that time. So I had no choice but to build my own whatever it was. When I was done, it was by no means complicated but very efficient.

The Turning Point

The turning point, for me, in using databases for dispersion modeling came when I learned how to perform calculations within queries. Not only could I use numerical formulas but I could manipulate and create text strings.

Example: Simple Formulas

In most cases, emission source parameters for air quality permitting are provided in English units. However, the values of these parameters must be in SI units to be used by most dispersion models. When validating model inputs for consistency with the air quality permit application, typically the reviewer looks through the spreadsheet the applicant provided performing the conversion. Many times the values in the spreadsheet did not match exactly with the values in the model input files, typically due to rounding.  Whether the values matched exactly or not, each spreadsheet cell had to be looked at for comparison. With dozens of sources and dozens of model input files, thousands of items needed to be checked between a spreadsheet and a text file. Sounds pretty darn tedious and time consuming. Welcome to the world of a dispersion modeling reviewer.

To save time, effort, and my sanity, I came up with a different approach rather than eye-balling each number. I would import all the model input files to a database table (we have a tool for that), then, using a SELECT query with the DISTINCT reserved word, I would extract unique records that included source id, height, exhaust temperature, velocity, and diameter for point sources and height, length, and width for area sources, then convert all the values to English units.

The statement looked something like this:

SELECT DISTINCT tblSource_PropValues.source_id,
tblSource_PropValues.Source_prop_value: Switch(
tblSource_PropValues.Source_prop='POINT_H',(tblSource_PropValues.Source_prop_value)/0.3048,
tblSource_PropValues.Source_prop='POINT_D',(tblSource_PropValues.Source_prop_value)/0.3048,
tblSource_PropValues.Source_prop='POINT_V',(tblSource_PropValues.Source_prop_value)/0.3048,
tblSource_PropValues.Source_prop='POINT_T',(tblSource_PropValues.Source_prop_value)-273.15)*(9/5)+32)
FROM  tblSource_PropValues

The Switch function in Access is equivalent to a Case statement. For each potential point source parameter (property), I could assign a conversion factor.

If I created a table from the values in the applicant’s spreadsheet, I would be able to compare whether the values matched those calculated in the query above with one click. What used to take 30 minutes of spot checking a few values had become about 5 minutes to check every single value. Though I did have to invest an hour or so up front to develop a database and workflow, I easily saw a return on my investment after 2 or 3 projects.

Later, I reversed the process to take data from the TCEQ emissions inventory database, which has stack parameters in English units, to produce model input files using string concatenation, such as:

SELECT
'SO SRCPARAM ' & 
tblSource_PropValues.ID & ' ' & 
tblSource_PropValues.Rate & ' ' &  
tblSource_PropValues.H & ' ' &  
tblSource_PropValues.T & ' ' &  
tblSource_PropValues.V & ' ' &  
tblSource_PropValues.D AS SRCPARAM
FROM  tblSource_PropValues

If any parameter values changed during the project, only the values in the database need to be updated, then re-run the query to generate new input files. No hunting for values in text files. A by-product of having all the input parameters in a database is the database can be linked to my report and GIS software. When any value needs to be updated, it need only be updated once and the updates propagate to the model input files, report, and graphics without another key stroke and eliminating the possibility of data entry errors.

Built-In Functions and More Functions

Besides just simple arithmetic, I discovered a wealth of built-in functions, like Switch, that could be used to manipulate data to either review someone else’s modeling or generate my own model input. Nearly every function I ever used in a spreadsheet, I could use in a database.

There are mathematical functions like Sqr, Sgn, Exp, Log, Int, ROUND, Sin, Cos,  and Tan; Text/String functions like LCase, Trim, Mid, StrReverse, and Chr; Date/Time functions like DateDiff, Year, Month, Day, Hour; data type conversion functions like CDbl, DLng, CStr; Aggregate functions like Avg, Max, Min, Sum, Count; and many more depending on which database platform you choose to use. The functions listed above are those available in Access.

If you decide to use a database platform like SQL Server, Oracle, MySQL, etc., if a function you need does not exist, these platforms allow for user defined functions to be created. One example of a potential user defined function for use in dispersion modeling would be for the effective diameter of flares. When all of your source parameters are in a table, with one click, calculate the effective diameters of one, one hundred, or one thousand flares.

Summary

Putting together all the pieces for a dispersion modeling analysis, e.g. source characteristics, building locations, terrain elevations, meteorological observations, and ambient monitoring concentrations, to name some of the pieces, can be a bit overwhelming and seem like a maze you are never quite sure how to navigate through. Though I have listed dozens of various functions you can use in a SQL statement with a couple of semi-cryptic example, using a database really can make the dispersion modeling process more streamlined and less tedious.

I know most of you are skeptical, but the next time you are using one of the commercial software interface products to put your project pieces together, count the number of buttons and menu items you can choose from and count the number of mouse clicks and entries you have to make. If something new could cut those numbers in half or more, would you be interested?

If you found this article informative, there is more helpful and actionable information for you.  Go to http://learn.naviknow.com to see a list of past webinar mini-courses. Every Wednesday (Webinar Wednesday), NaviKnow is offering FREE webinar mini-courses on topics related to air quality dispersion modeling and air quality permitting. If you want to be on our email list, drop me a line at [email protected].

One of the goals of NaviKnow is to create an air quality professional community to share ideas and helpful hints like those covered in this article. So if you found this article helpful, please share with a colleague.