An automatic report for the dataset : Currency exchange - Indonesian Rupiah (IDR)

The Relational Automatic Statistician
Abstract

This report was produced by the Automatic Bayesian Covariance Discovery (ABCD) algorithm.

1 Executive summary

The raw data and full model posterior with extrapolations are shown in figure 1.

Figure 1: Raw data (left) and model posterior with extrapolation (right)

The structure search algorithm has identified six additive components in the data. The first 3 additive components explain 98.4% of the variation in the data as shown by the coefficient of determination (R2) values in table 1. The first 4 additive components explain 99.5% of the variation in the data. After the first 3 components the cross validated mean absolute error (MAE) does not decrease by more than 0.1%. This suggests that subsequent terms are modelling very short term trends, uncorrelated noise or are artefacts of the model or search procedure. Short summaries of the additive components are as follows:

  • A smooth function. This function applies until 06 Jun 2015 and from 16 Jul 2015 onwards.

  • A constant. This function applies from 06 Jun 2015 until 16 Jul 2015.

  • A smooth function. This function applies until 06 Jun 2015, from 16 Jul 2015 until 26 Sep 2015 and from 09 Oct 2015 onwards.

  • Uncorrelated noise. This function applies until 06 Jun 2015, from 16 Jul 2015 until 26 Sep 2015 and from 09 Oct 2015 onwards.

  • Uncorrelated noise. This function applies from 26 Sep 2015 until 09 Oct 2015.

  • A very smooth function. This function applies from 26 Sep 2015 until 09 Oct 2015.

# R2 (%) ΔR2 (%) Residual R2 (%) Cross validated MAE Reduction in MAE (%)
- - - - 4.13 -
1 -3460.2 -3460.2 -3460.2 1.24 70.0
2 79.6 3539.8 99.4 0.21 82.7
3 98.4 18.8 92.1 0.18 13.6
4 99.5 1.1 68.4 0.18 0.0
5 99.8 0.3 58.0 0.18 0.0
6 100.0 0.2 100.0 0.19 -1.5
Table 1: Summary statistics for cumulative additive fits to the data. The residual coefficient of determination (R2) values are computed using the residuals from the previous fit as the target values; this measures how much of the residual variance is explained by each new component. The mean absolute error (MAE) is calculated using 10 fold cross validation with a contiguous block design; this measures the ability of the model to interpolate and extrapolate over moderate distances. The model is fit using the full data and the MAE values are calculated using this model; this double use of data means that the MAE values cannot be used reliably as an estimate of out-of-sample predictive performance.

Model checking statistics are summarised in table 2 in section 4. These statistics have revealed highly statistically significant discrepancies between the data and model in components 1 and 2. Moderate discrepancies have also been detected in component 3.

The rest of the document is structured as follows. In section 2 the forms of the additive components are described and their posterior distributions are displayed. In section 3 the modelling assumptions of each component are discussed with reference to how this affects the extrapolations made by the model. Section 4 discusses model checking statistics, with plots showing the form of any detected discrepancies between the model and observed data.

2 Detailed discussion of additive components

2.1 Component 1 : A smooth function. This function applies until 06 Jun 2015 and from 16 Jul 2015 onwards

This component is a smooth function with a typical lengthscale of 4.1 weeks. This component applies until 06 Jun 2015 and from 16 Jul 2015 onwards.

This component explains -3460.2% of the total variance. The addition of this component reduces the cross validated MAE by 70.0% from 4.1 to 1.2.

Figure 2: Pointwise posterior of component 1 (left) and the posterior of the cumulative sum of components with data (right)
Figure 3: Pointwise posterior of residuals after adding component 1

2.2 Component 2 : A constant. This function applies from 06 Jun 2015 until 16 Jul 2015

This component is constant. This component applies from 06 Jun 2015 until 16 Jul 2015.

This component explains 99.4% of the residual variance; this increases the total variance explained from -3460.2% to 79.6%. The addition of this component reduces the cross validated MAE by 82.73% from 1.24 to 0.21.

Figure 4: Pointwise posterior of component 2 (left) and the posterior of the cumulative sum of components with data (right)
Figure 5: Pointwise posterior of residuals after adding component 2

2.3 Component 3 : A smooth function. This function applies until 06 Jun 2015, from 16 Jul 2015 until 26 Sep 2015 and from 09 Oct 2015 onwards

This component is a smooth function with a typical lengthscale of 4.0 days. This component applies until 06 Jun 2015, from 16 Jul 2015 until 26 Sep 2015 and from 09 Oct 2015 onwards.

This component explains 92.1% of the residual variance; this increases the total variance explained from 79.6% to 98.4%. The addition of this component reduces the cross validated MAE by 13.63% from 0.21 to 0.18.

Figure 6: Pointwise posterior of component 3 (left) and the posterior of the cumulative sum of components with data (right)
Figure 7: Pointwise posterior of residuals after adding component 3

2.4 Component 4 : Uncorrelated noise. This function applies until 06 Jun 2015, from 16 Jul 2015 until 26 Sep 2015 and from 09 Oct 2015 onwards

This component models uncorrelated noise. This component applies until 06 Jun 2015, from 16 Jul 2015 until 26 Sep 2015 and from 09 Oct 2015 onwards.

This component explains 68.4% of the residual variance; this increases the total variance explained from 98.4% to 99.5%. The addition of this component reduces the cross validated MAE by 0.00% from 0.18 to 0.18. This component explains residual variance but does not improve MAE which suggests that this component describes very short term patterns, uncorrelated noise or is an artefact of the model or search procedure.

Figure 8: Pointwise posterior of component 4 (left) and the posterior of the cumulative sum of components with data (right)
Figure 9: Pointwise posterior of residuals after adding component 4

2.5 Component 5 : Uncorrelated noise. This function applies from 26 Sep 2015 until 09 Oct 2015

This component models uncorrelated noise. This component applies from 26 Sep 2015 until 09 Oct 2015.

This component explains 58.0% of the residual variance; this increases the total variance explained from 99.5% to 99.8%. The addition of this component reduces the cross validated MAE by 0.00% from 0.18 to 0.18. This component explains residual variance but does not improve MAE which suggests that this component describes very short term patterns, uncorrelated noise or is an artefact of the model or search procedure.

Figure 10: Pointwise posterior of component 5 (left) and the posterior of the cumulative sum of components with data (right)
Figure 11: Pointwise posterior of residuals after adding component 5

2.6 Component 6 : A very smooth function. This function applies from 26 Sep 2015 until 09 Oct 2015

This component is a very smooth function. This component applies from 26 Sep 2015 until 09 Oct 2015.

This component explains 100.0% of the residual variance; this increases the total variance explained from 99.8% to 100.0%. The addition of this component increases the cross validated MAE by 1.54% from 0.18 to 0.19. This component explains residual variance but does not improve MAE which suggests that this component describes very short term patterns, uncorrelated noise or is an artefact of the model or search procedure.

Figure 12: Pointwise posterior of component 6 (left) and the posterior of the cumulative sum of components with data (right)

3 Extrapolation

Summaries of the posterior distribution of the full model are shown in figure 13. The plot on the left displays the mean of the posterior together with pointwise variance. The plot on the right displays three random samples from the posterior.

Figure 13: Full model posterior with extrapolation. Mean and pointwise variance (left) and three random samples (right)

Below are descriptions of the modelling assumptions associated with each additive component and how they affect the predictive posterior. Plots of the pointwise posterior and samples from the posterior are also presented, showing extrapolations from each component and the cuulative sum of components.

3.1 Component 1 : A smooth function. This function applies until 06 Jun 2015 and from 16 Jul 2015 onwards

This component is assumed to continue smoothly but is also assumed to be stationary so its distribution will return to the prior. The prior distribution places mass on smooth functions with a marginal mean of zero and a typical lengthscale of 4.1 weeks. [This is a placeholder for a description of how quickly the posterior will start to resemble the prior].

Figure 14: Posterior of component 1 (top) and cumulative sum of components (bottom) with extrapolation. Mean and pointwise variance (left) and three random samples from the posterior distribution (right).

3.2 Component 2 : A constant. This function applies from 06 Jun 2015 until 16 Jul 2015

This component is assumed to stop before the end of the data and will therefore be extrapolated as zero.

Figure 15: Posterior of component 2 (top) and cumulative sum of components (bottom) with extrapolation. Mean and pointwise variance (left) and three random samples from the posterior distribution (right).

3.3 Component 3 : A smooth function. This function applies until 06 Jun 2015, from 16 Jul 2015 until 26 Sep 2015 and from 09 Oct 2015 onwards

This component is assumed to continue smoothly but is also assumed to be stationary so its distribution will return to the prior. The prior distribution places mass on smooth functions with a marginal mean of zero and a typical lengthscale of 4.0 days. [This is a placeholder for a description of how quickly the posterior will start to resemble the prior].

Figure 16: Posterior of component 3 (top) and cumulative sum of components (bottom) with extrapolation. Mean and pointwise variance (left) and three random samples from the posterior distribution (right).

3.4 Component 4 : Uncorrelated noise. This function applies until 06 Jun 2015, from 16 Jul 2015 until 26 Sep 2015 and from 09 Oct 2015 onwards

This component assumes the uncorrelated noise will continue indefinitely.

Figure 17: Posterior of component 4 (top) and cumulative sum of components (bottom) with extrapolation. Mean and pointwise variance (left) and three random samples from the posterior distribution (right).

3.5 Component 5 : Uncorrelated noise. This function applies from 26 Sep 2015 until 09 Oct 2015

This component is assumed to stop before the end of the data and will therefore be extrapolated as zero.

Figure 18: Posterior of component 5 (top) and cumulative sum of components (bottom) with extrapolation. Mean and pointwise variance (left) and three random samples from the posterior distribution (right).

3.6 Component 6 : A very smooth function. This function applies from 26 Sep 2015 until 09 Oct 2015

This component is assumed to stop before the end of the data and will therefore be extrapolated as zero.

Figure 19: Posterior of component 6 (top) and cumulative sum of components (bottom) with extrapolation. Mean and pointwise variance (left) and three random samples from the posterior distribution (right).

4 Model checking

Several posterior predictive checks have been performed to assess how well the model describes the observed data. These tests take the form of comparing statistics evaluated on samples from the prior and posterior distributions for each additive component. The statistics are derived from autocorrelation function (ACF) estimates, periodograms and quantile-quantile (qq) plots.

Table 2 displays cumulative probability and p-value estimates for these quantities. Cumulative probabilities near 0/1 indicate that the test statistic was lower/higher under the posterior compared to the prior unexpectedly often i.e. they contain the same information as a p-value for a two-tailed test and they also express if the test statistic was higher or lower than expected. p-values near 0 indicate that the test statistic was larger in magnitude under the posterior compared to the prior unexpectedly often.

ACF Periodogram QQ
# min min loc max max loc max min
1 0.887 0.902 1.000 0.108 0.000 1.000
2 0.497 0.485 0.998 0.472 0.001 0.999
3 0.674 0.679 0.736 0.132 0.033 0.888
4 0.517 0.527 0.497 0.510 0.164 0.211
5 0.484 0.502 0.516 0.470 0.489 0.446
6 0.494 0.520 0.117 0.499 0.459 0.283
Table 2: Model checking statistics for each component. Cumulative probabilities for minimum of autocorrelation function (ACF) and its location. Cumulative probabilities for maximum of periodogram and its location. p-values for maximum and minimum deviations of QQ-plot from straight line.

The nature of any observed discrepancies is now described and plotted and hypotheses are given for the patterns in the data that may not be captured by the model.

4.1 Highly statistically significant discrepancies

4.1.1 Component 1 : A smooth function. This function applies until 06 Jun 2015 and from 16 Jul 2015 onwards

The following discrepancies between the prior and posterior distributions for this component have been detected.

  • The maximum value of the periodogram is unexpectedly high. This discrepancy has an estimated p-value of 0.000.

  • The qq plot has an unexpectedly large positive deviation from equality (x=y). This discrepancy has an estimated p-value of 0.000.

The large maximum value of the periodogram can indicate periodicity that is not being captured by the model. The positive deviation in the qq-plot can indicate heavy positive tails if it occurs at the right of the plot or light negative tails if it occurs as the left.

Figure 20: ACF (top left), periodogram (top right) and quantile-quantile (bottom left) uncertainty plots. The blue line and shading are the pointwise mean and 90% confidence interval of the plots under the prior distribution for component 1. The green line and green dashed lines are the corresponding quantities under the posterior.

4.1.2 Component 2 : A constant. This function applies from 06 Jun 2015 until 16 Jul 2015

The following discrepancies between the prior and posterior distributions for this component have been detected.

  • The qq plot has an unexpectedly large positive deviation from equality (x=y). This discrepancy has an estimated p-value of 0.001.

  • The maximum value of the periodogram is unexpectedly high. This discrepancy has an estimated p-value of 0.004.

The positive deviation in the qq-plot can indicate heavy positive tails if it occurs at the right of the plot or light negative tails if it occurs as the left. The large maximum value of the periodogram can indicate periodicity that is not being captured by the model.

Figure 21: ACF (top left), periodogram (top right) and quantile-quantile (bottom left) uncertainty plots. The blue line and shading are the pointwise mean and 90% confidence interval of the plots under the prior distribution for component 2. The green line and green dashed lines are the corresponding quantities under the posterior.

4.2 Moderately statistically significant discrepancies

4.2.1 Component 3 : A smooth function. This function applies until 06 Jun 2015, from 16 Jul 2015 until 26 Sep 2015 and from 09 Oct 2015 onwards

The following discrepancies between the prior and posterior distributions for this component have been detected.

  • The qq plot has an unexpectedly large positive deviation from equality (x=y). This discrepancy has an estimated p-value of 0.033.

The positive deviation in the qq-plot can indicate heavy positive tails if it occurs at the right of the plot or light negative tails if it occurs as the left.

Figure 22: ACF (top left), periodogram (top right) and quantile-quantile (bottom left) uncertainty plots. The blue line and shading are the pointwise mean and 90% confidence interval of the plots under the prior distribution for component 3. The green line and green dashed lines are the corresponding quantities under the posterior.

4.3 Model checking plots for components without statistically significant discrepancies

4.3.1 Component 4 : Uncorrelated noise. This function applies until 06 Jun 2015, from 16 Jul 2015 until 26 Sep 2015 and from 09 Oct 2015 onwards

No discrepancies between the prior and posterior of this component have been detected

Figure 23: ACF (top left), periodogram (top right) and quantile-quantile (bottom left) uncertainty plots. The blue line and shading are the pointwise mean and 90% confidence interval of the plots under the prior distribution for component 4. The green line and green dashed lines are the corresponding quantities under the posterior.

4.3.2 Component 5 : Uncorrelated noise. This function applies from 26 Sep 2015 until 09 Oct 2015

No discrepancies between the prior and posterior of this component have been detected

Figure 24: ACF (top left), periodogram (top right) and quantile-quantile (bottom left) uncertainty plots. The blue line and shading are the pointwise mean and 90% confidence interval of the plots under the prior distribution for component 5. The green line and green dashed lines are the corresponding quantities under the posterior.

4.3.3 Component 6 : A very smooth function. This function applies from 26 Sep 2015 until 09 Oct 2015

No discrepancies between the prior and posterior of this component have been detected

Figure 25: ACF (top left), periodogram (top right) and quantile-quantile (bottom left) uncertainty plots. The blue line and shading are the pointwise mean and 90% confidence interval of the plots under the prior distribution for component 6. The green line and green dashed lines are the corresponding quantities under the posterior.

5 MMD - experimental section

# mmd
1 0.000
2 0.000
3 0.000
4 0.072
5 0.397
6 0.133
Table 3: MMD p-values

5.0.1 Component 1 : A smooth function. This function applies until 06 Jun 2015 and from 16 Jul 2015 onwards

Figure 26: MMD plot

5.0.2 Component 2 : A constant. This function applies from 06 Jun 2015 until 16 Jul 2015

Figure 27: MMD plot

5.0.3 Component 3 : A smooth function. This function applies until 06 Jun 2015, from 16 Jul 2015 until 26 Sep 2015 and from 09 Oct 2015 onwards

Figure 28: MMD plot

5.0.4 Component 4 : Uncorrelated noise. This function applies until 06 Jun 2015, from 16 Jul 2015 until 26 Sep 2015 and from 09 Oct 2015 onwards

Figure 29: MMD plot

5.0.5 Component 5 : Uncorrelated noise. This function applies from 26 Sep 2015 until 09 Oct 2015

Figure 30: MMD plot

5.0.6 Component 6 : A very smooth function. This function applies from 26 Sep 2015 until 09 Oct 2015

Figure 31: MMD plot