02.Time Series Analysis

HaiyueAugust 9, 2023About 9 min

Key Points

Tread
Seasonality
ARMA
ARCH&GARCH
Rain model
Diffuse SolarRadiation
Synthetic Generation

Basic Definition but Important

The power spectrum (a.k.a., autospectrum or autospectral density) is the distribution of the total variance of a time series over frequency.
Another words, power spectrum is a method, which could broken down into weighted sum of sin waves for a time series data.

Video for power spectrum

A time series data could be broken down into weighted sum of sin waves. Each wave has three important features, frequency, amplitude and phase. The most important frequencies have similar weights as we add in the other frequencies we get the entire power spectrum, note the phase information has been discarded. We can smooth it out a bit and now by taking the logarithm, we get the log power spectrum, which tells us the relative importance of each frequency component to the overall .

What is Fourier Transform

All periodical data could be broken down into sum of several weighted sin waves.

Video for Fourier Transform

References

What is seasonality

Seasonality is the presence of variations that occur at specific regular intervals(hourly, daily, weekly, monthly, yearly, etc.).

Video for Seasonality

What is Auto Correlation (ACF)

A time series is a sequence of measurements of the same variable(s) made over time. The coefficient of correlation between two values in a time series is called the autocorrelation function (ACF).

Other words

Autocorrelation represents the degree of similarity between a given time series and a lagged version of itself over successive time intervals.

Autocorrelation measures the relationship between a variable’s current value and its past values.

An autocorrelation of +1 represents a perfect positive correlation, while an autocorrelation of negative 1 represents a perfect negative correlation.

The importantance

Help us uncover hidden patterns in our data and help us select the correct forecasting methods.
Help identify seasonality in our time series data.
Analyzing the autocorrelation function (ACF) and partial autocorrelation function (PACF) in conjunction is necessary for selecting the appropriate ARIMA model for any time series prediction.
The autocorrelation analysis helps detect patterns and check for randomness.
It’s especially important when you intend to use an autoregressive–moving-average (ARMA) model for forecasting because it helps to determine its parameters.

References

What is AR, MA, ARMA, ARIMA, GARCH

Background information

In time-series, we often observe similarities between past and present values. That’s because we encounter autocorrelation within such data. In other words, by knowing the price of a product today, we can often make a rough prediction about its valuation tomorrow. So, in this tutorial, we’re going to discuss a model that reflects this correlation. – the autoregressive model.

The Autoregressive Model, or AR model for short, relies only on past period values to predict current ones. It’s a linear model, where current period values are a sum of past outcomes multiplied by a numeric factor.

What it looks like

$x_t = C + \phi_1X_{t-1} + \in_t$

$X_{t-1}$ : represents the value of X during the previous period.
$ϕ_1$ : is the coefficient, which is a numeric constant by which we multiply the lagged variable $X_{(t-1)}$ .
$\in_t$ : It’s called the residual and represents the difference between our prediction for period t and the correct value ( $\in_t = y_t - \hat{y}_t$ ). These residuals are usually unpredictable differences because if there’s a pattern, it will be captured in the other incumbents of the model.

Autoregressive Model with More Lags

From a mathematical point of view, a model using two lags (AR(2)) would look as follows:

$X_t = C + ϕ_1 X_{t-1} + ϕ_2 X_{t-2} + ϵ_t$

Background information

In time-series, we sometimes observe similarities between past errors and present values. That’s because certain unpredictable events happen, and they need to be accounted for.
In other words, by knowing how far off our estimation yesterday was, compared to the actual value, we can tweak our model, so that it responds accordingly.

What it looks like

Let’s suppose that “r” is some time-series variable, like returns. Then, a simple Moving Average (MA) model looks like this:
$r_t = c+\theta_1\in_{t-1} + \in_t$

$r_t$ : represents the values of “r” in the current period - t.
$c$ : stands for a constant factor.
$\theta_1$ : numeric coefficient for the value associated with the 1st lag.
$ϵ_t$ and $ϵ_{t-1}$ : represent the residuals for the current and the previous period, respectively.

ARMA

Background information

In time series, we often rely on past data to make estimates about current and future values. However, sometimes that’s not enough. When unexpected events like natural disasters, financial crises, or even wars happen, there can be a sudden shift in values. That's why we need models that simultaneously use past data as a foundation for estimates, but can also quickly adjust to unpredictable shocks.

Autoregressive Moving Average (ARMA) is a model, which takes into account past values, as well as past errors when constructing future estimates. It comes from merging two simpler models - the Autoregressive, or AR, and the Moving Average, or MA.

What is ARMA model looks like

Let’s suppose that “Y” is some random time-series variable. A simple Autoregressive Moving Average model would be like:

$y_t = c + \phi_1y_{(t-1)} + \theta_1\in_{(t-1)} + \in_{t}$

$y_t$ : Values in the current period.
$y_{(t-1)}$ : Values of 1 period ago respectively.
$\in_{t}$ : Error terms for the current period.
$\in_{(t-1)}$ : Error terms of 1 period ago respectively.
$c$ : A baseline constant factor.
$\phi_1$ : Expresses average what part of the value last period $y_{(t-1)}$ is relevant in explaining the current one.
$\theta_1$ : Expresses average what part of the error last period $\in_{(t-1)}$ is relevant in explaining the current one.

Tips

The error term from the last period is used to help us correct our predictions.

ARIMA

Background information

In our previous tutorial, we became familiar with the ARMA model. But did you know that we can expand the ARMA model to handle non-stationary data?
Well, that’s exactly what we’re going to cover in this post - the intuition behind the ARIMA model, the notation that goes with it, and how it differs from the ARMA model.

An ARIMA model has three orders – p, d, and q (ARIMA(p,d,q)).

The “p” and “q” represent the autoregressive (AR) and moving average (MA) lags just like with the ARMA models.
The “d” order is the integration order. It represents the number of times we need to integrate the time series to ensure stationarity, but more on that in just a bit.

What does a simple ARIMA (1,1,1) look like?

With all orders equal to 1. Suppose P is the price variable we’re trying to model. Then, the simple ARIMA equation for P would look as follows:
$\varDelta P_t = c + \phi_1 \varDelta P_{t-1} + \theta_1\in_{t-1} + \in_t$

$P_t$ and $P_{t-1}$ : represent the values in the current period and 1 period ago respectively.
$ϵ_t$ and $ϵ_{t-1}$ : are the error terms for the same two periods.
$c$ : is just a baseline constant factor.
$ϕ_1$ and $θ_1$ : express what parts of the value ( $P_{t-1}$ ) and error ( $ϵ_{t-1}$ ) last period are relevant in estimating the current one.
$ΔP_{t-1}$ : the difference between prices in period “t” and prices in the preceding period ( $ΔP_t = P_{t-1}-P_t$ ).
$ΔP$ : is an entire time-series, which represents the disparity between prices of consecutive periods.

ARCH

Autoregressive conditional heteroskedasticity (ARCH) is a statistical model used to analyze volatility in time series in order to forecast future volatility. In the financial world, ARCH modeling is used to estimate risk by providing a model of volatility that more closely resembles real markets. ARCH modeling shows that periods of high volatility are followed by more high volatility and periods of low volatility are followed by more low volatility.

$Z_t = \sigma_t\varepsilon_t,...\sigma_t^2 =\alpha_0 + \alpha_1*Z_{t-1}^2 + ... + \alpha_m*Z_{t-m}^2$

Problems with ARCH
Shortage
- The model assumes that positive and negative shocks have the same effects on volatility because it depends on the square of the previous shocks. This is not reasonable.
- For an ARCH(1) model, α1 must be in the interval [1, 1/3]. This will restrict the ability to deal with leptokurtic series.
- It often requires many parameters to describe the volatility process of a series.

GARCH

Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) is a statistical model used in analyzing time-series data where the variance error is believed to be serially autocorrelated. GARCH models assume that the variance of the error term follows an autoregressive moving average process.

GARCH is a statistical modeling technique used to help predict the volatility of returns on financial assets.
GARCH is appropriate for time series data where the variance of the error term is serially autocorrelated following an autoregressive moving average process.
GARCH is useful to assess risk and expected returns for assets that exhibit clustered periods of volatility in returns.

$Z_t^2 = \phi_0 + \sum_{i=1}^{p}\phi Z_{t-i}^2 + \sum_{j=1}^q\theta_j \eta_{t-j}^2$

Then, the GARCH estimates are given by $β_i = \theta_i$ and $α_i = \phi_i − \theta_i$

Important : The relationship between AR, ARMA, ARCH and GARCH

In econometrics, the autoregressive conditional heteroskedasticity (ARCH) model is a statistical model for time series data that describes the variance of the current error term or innovation as a function of the actual sizes of the previous time periods' error terms; often the variance is related to the squares of the previous innovations.

The ARCH model is appropriate when the error variance in a time series follows an autoregressive (AR) model; if an autoregressive moving average (ARMA) model is assumed for the error variance, the model is a generalized autoregressive conditional heteroskedasticity (GARCH) model.

References

What's gamma distribution

Gamma distribution is the maximum entropy probability distribution, which is defined by two parameters, which are shape parameter k and a scale parameter $\theta$ . Normally, using $\alpha$ = k to instead of k, $\beta=\frac{1}{\theta}$ install of $\theta$ that is called rate parameter.

$\LARGE{f(x) = \frac{1}{\beta ^{\alpha}\varGamma(\alpha)}\int_{0}^{\infin}x^{r+\alpha-1}e^{-\frac{x}{\beta}}dx}$

$\varGamma(\alpha) = $

References

Gamma distribution

[gamma分布怎么做（gamma分布函数解析）](https://www.gsseo.net/yunying/18180.html)

What's empirical cumulative distribution functions (ECDF)

References

Understanding Empirical Cumulative Distribution Functions

What's the likelihood

What's is Error Metrics

When considering the performance of any forecasting model, the prediction values it produces must be evaluated. This is done by calculating suitable error metrics. An error metric is a way to quantify the performance of a model and provides a way for the forecaster to quantitatively compare different models. They give us a way to more objectively gauge how well the model executes its tasks.

Error Metrics: How to Evaluate Your Forecasts

02.Time Series Analysis

Key Points

Basic Definition but Important

What is time series

What is power spectrum

What is Fourier Transform

What is seasonality

What is Auto Correlation (ACF)

What is AR, MA, ARMA, ARIMA, GARCH

Background information

What it looks like

Autoregressive Model with More Lags

Background information

What it looks like

Background information

What is ARMA model looks like

Background information

What does a simple ARIMA (1,1,1) look like?

What's gamma distribution

References

What's empirical cumulative distribution functions (ECDF)

References

What's the likelihood

What's is Error Metrics