An automatic report for the dataset : San Diego Housing Prices

The Relational Automatic Statistician
Abstract

This report was produced by the Automatic Bayesian Covariance Discovery (ABCD) algorithm.

1 Executive summary

The raw data and full model posterior with extrapolations are shown in figure 1.

The structure search algorithm has identified six additive components in the data. The first 3 additive components explain 99.5% of the variation in the data as shown by the coefficient of determination ($R^{2}$) values in table 1. After the first 5 components the cross validated mean absolute error (MAE) does not decrease by more than 0.1%. This suggests that subsequent terms are modelling very short term trends, uncorrelated noise or are artefacts of the model or search procedure. Short summaries of the additive components are as follows:

• A constant.

• A smooth function with linearly decreasing marginal standard deviation.

• A linearly decreasing function. This function applies from Sep 2009 until Nov 2011.

• A smooth function with linearly decreasing marginal standard deviation. This function applies until Sep 2009 and from Nov 2011 onwards.

• A smooth function.

• Uncorrelated noise.

Model checking statistics are summarised in table 2 in section 4. These statistics have not revealed any inconsistencies between the model and observed data.

The rest of the document is structured as follows. In section 2 the forms of the additive components are described and their posterior distributions are displayed. In section 3 the modelling assumptions of each component are discussed with reference to how this affects the extrapolations made by the model. Section 4 discusses model checking statistics, with plots showing the form of any detected discrepancies between the model and observed data.

2 Detailed discussion of additive components

2.1 Component 1 : A constant

This component is constant.

This component explains 0.0% of the total variance. The addition of this component reduces the cross validated MAE by 82.6% from 190.9 to 33.3.

2.2 Component 2 : A smooth function with linearly decreasing marginal standard deviation

This component is a smooth function with a typical lengthscale of 2.2 years. The marginal standard deviation of the function decreases linearly.

This component explains 80.2% of the residual variance; this increases the total variance explained from 0.0% to 80.2%. The addition of this component reduces the cross validated MAE by 62.07% from 33.30 to 12.63.

2.3 Component 3 : A linearly decreasing function. This function applies from Sep 2009 until Nov 2011

This component is linearly decreasing. This component applies from Sep 2009 until Nov 2011.

This component explains 97.4% of the residual variance; this increases the total variance explained from 80.2% to 99.5%. The addition of this component reduces the cross validated MAE by 77.29% from 12.63 to 2.87.

2.4 Component 4 : A smooth function with linearly decreasing marginal standard deviation. This function applies until Sep 2009 and from Nov 2011 onwards

This component is a smooth function with a typical lengthscale of 4.5 months. The marginal standard deviation of the function decreases linearly. This component applies until Sep 2009 and from Nov 2011 onwards.

This component explains 95.1% of the residual variance; this increases the total variance explained from 99.5% to 100.0%. The addition of this component reduces the cross validated MAE by 20.89% from 2.87 to 2.27.

2.5 Component 5 : A smooth function

This component is a smooth function with a typical lengthscale of 7.2 weeks.

This component explains 85.9% of the residual variance; this increases the total variance explained from 100.0% to 100.0%. The addition of this component reduces the cross validated MAE by 0.31% from 2.27 to 2.26.

2.6 Component 6 : Uncorrelated noise

This component models uncorrelated noise.

This component explains 100.0% of the residual variance; this increases the total variance explained from 100.0% to 100.0%. The addition of this component reduces the cross validated MAE by 0.00% from 2.26 to 2.26. This component explains residual variance but does not improve MAE which suggests that this component describes very short term patterns, uncorrelated noise or is an artefact of the model or search procedure.

3 Extrapolation

Summaries of the posterior distribution of the full model are shown in figure 13. The plot on the left displays the mean of the posterior together with pointwise variance. The plot on the right displays three random samples from the posterior.

Below are descriptions of the modelling assumptions associated with each additive component and how they affect the predictive posterior. Plots of the pointwise posterior and samples from the posterior are also presented, showing extrapolations from each component and the cuulative sum of components.

3.1 Component 1 : A constant

This component is assumed to stay constant.

3.2 Component 2 : A smooth function with linearly decreasing marginal standard deviation

This component is assumed to continue smoothly but is also assumed to be stationary so its distribution will return to the prior. The prior distribution places mass on smooth functions with a marginal mean of zero and a typical lengthscale of 2.2 years. [This is a placeholder for a description of how quickly the posterior will start to resemble the prior]. The marginal standard deviation of the function is assumed to continue to decrease linearly until Jan 2026 after which the marginal standard deviation of the function is assumed to start increasing linearly.

3.3 Component 3 : A linearly decreasing function. This function applies from Sep 2009 until Nov 2011

This component is assumed to stop before the end of the data and will therefore be extrapolated as zero.

3.4 Component 4 : A smooth function with linearly decreasing marginal standard deviation. This function applies until Sep 2009 and from Nov 2011 onwards

This component is assumed to continue smoothly but is also assumed to be stationary so its distribution will return to the prior. The prior distribution places mass on smooth functions with a marginal mean of zero and a typical lengthscale of 4.5 months. [This is a placeholder for a description of how quickly the posterior will start to resemble the prior]. The marginal standard deviation of the function is assumed to continue to decrease linearly until Aug 2022 after which the marginal standard deviation of the function is assumed to start increasing linearly.

3.5 Component 5 : A smooth function

This component is assumed to continue smoothly but is also assumed to be stationary so its distribution will return to the prior. The prior distribution places mass on smooth functions with a marginal mean of zero and a typical lengthscale of 7.2 weeks. [This is a placeholder for a description of how quickly the posterior will start to resemble the prior].

3.6 Component 6 : Uncorrelated noise

This component assumes the uncorrelated noise will continue indefinitely.

4 Model checking

Several posterior predictive checks have been performed to assess how well the model describes the observed data. These tests take the form of comparing statistics evaluated on samples from the prior and posterior distributions for each additive component. The statistics are derived from autocorrelation function (ACF) estimates, periodograms and quantile-quantile (qq) plots.

Table 2 displays cumulative probability and $p$-value estimates for these quantities. Cumulative probabilities near 0/1 indicate that the test statistic was lower/higher under the posterior compared to the prior unexpectedly often i.e. they contain the same information as a $p$-value for a two-tailed test and they also express if the test statistic was higher or lower than expected. $p$-values near 0 indicate that the test statistic was larger in magnitude under the posterior compared to the prior unexpectedly often.

No statistically significant discrepancies between the data and model have been detected but model checking plots for each component are presented below.

4.1 Model checking plots for components without statistically significant discrepancies

4.1.1 Component 1 : A constant

No discrepancies between the prior and posterior of this component have been detected

4.1.2 Component 2 : A smooth function with linearly decreasing marginal standard deviation

No discrepancies between the prior and posterior of this component have been detected

4.1.3 Component 3 : A linearly decreasing function. This function applies from Sep 2009 until Nov 2011

No discrepancies between the prior and posterior of this component have been detected

4.1.4 Component 4 : A smooth function with linearly decreasing marginal standard deviation. This function applies until Sep 2009 and from Nov 2011 onwards

No discrepancies between the prior and posterior of this component have been detected

4.1.5 Component 5 : A smooth function

No discrepancies between the prior and posterior of this component have been detected

4.1.6 Component 6 : Uncorrelated noise

No discrepancies between the prior and posterior of this component have been detected