Linear Regression and Cortex version 1.10

Linear Regression and Cortex version 1.10

A historical project repository is one of the most important assets a company can possess.  It’s a treasure trove of information that is essential for future project planning and reducing systemic risks associated with budget and schedule overruns.

In fact, there may not be a more impactful topic in preconstruction today than how to use your data. It doesn’t always have to be “Big data.” Trustworthy “small data” delivers powerful results too.

Linear Regression is one of the most effective and straight forward statistical methods

While common in the capital project industry, parametric models for linear regression are also especially useful for the vertical building industry. This method uses the concept of independent variables that correlate strongly to dependent variables—typically, explanatory variables such as Quantity versus what we see most often, the dependent variable represented as Cost.

The R-squared (R2) value is the measurement of the closeness of this relationship. R-squared is a statistical measure of how close the data are to the fitted regression line (values closer to 1 show the most significance). This value is also known as the coefficient of determination and is commonly used as a proxy for how well the algorithm predicts the calculated costs.

To use your data effectively, you need to prepare the system to store your data reliably and help you understand the data. This lays the groundwork for developing tools that enable you to use your data for benchmarking and parametric estimating with cost estimating relationships.

Publish data using Microsoft Excel

Eos Cortex Project History is a repository for your data. Publishing projects to Cortex is simple using Excel and supports the understanding of your explanatory and dependent data. Cortex can also normalize cost to today’s currency (or future currency) because your projects are executed at various points in time and/or location.

Consider a scenario where office building projects are represented with variables for project duration in months and cost. Using these two simple variables in scatter charts, Cortex provides a basic understanding of the closeness of the data.

With the release of Cortex Project History version 1.10, the scatter chart includes a linear regression trendline and its R2 value to provide a visual representation of how strongly the data correlates (R2 values closer to 1 represent significance). With an R2 value of .0606, there is not a strong correlation.

Also available in the new release are first and second standard deviation lines demonstrating the data points’ variance from the mean or average. As you can see, out of the 18 projects, four fall outside the first standard deviation, which weakens the correlation.

Adding the second standard deviation shows that typically 5% (or less) of the projects are outside of the variance.

Based on this data, you may not feel confident about using this model for benchmarking or conceptual estimating purposes. Let’s look at a different explanatory variable on the x-axis, project size based on gross building area.

With an R2 value of .4108 and all but one project within the first standard deviation, this model provides more confidence. To further refine your model, Cortex allows you to remove outliers giving the model stronger correlation (perhaps the Fiesta Bravo project did not have similar scope or was not executed properly). This is an important step as you refine your cost model for re-use.

After Cortex recasts the data, we see a significantly better R2 value of .8498 and all projects are within the first standard deviation.

The final step would be to save this model using a Cortex portfolio. Now you can reuse the model to benchmark new projects, or for conceptual estimating using the linear regression function. For more information on linear regression, see our white paper. For more information on Eos Cortex Project History, you can visit our website.