Linear Regression Curve

Haiyue
13min

Linear Regression Curve (LRC)

Indicator Overview
  • Category: Trend — Overlay Indicator
  • Default Parameters: period=20
  • Output: Fitted linear regression value at each time point (connected into a curve)
  • Applicable Markets: Stocks, Futures, Forex, Cryptocurrencies

I. What is the Linear Regression Curve

The Linear Regression Curve (LRC) is a technical indicator that applies classic statistical ordinary least squares (OLS) linear regression to financial time series. At each time point, it fits a linear regression to the most recent nn periods of closing prices and takes the fitted value at the current moment as its output. Connecting these fitted values produces a smooth trend curve.

The linear regression method itself was independently proposed by Carl Friedrich Gauss and Adrien-Marie Legendre in the early 19th century. Its application in technical analysis developed alongside the proliferation of computers.

Relationship to LSMA

The Linear Regression Curve is numerically equivalent to the Least Squares Moving Average (LSMA). Both produce identical results — they take the fitted value at the end of a rolling regression window. The difference lies mainly in naming and conceptual emphasis:

  • LSMA: Emphasizes the “moving average” concept, highlighting it as a smoothing method with reduced lag
  • LRC: Emphasizes the “regression fitting” concept, focusing on trend slope and goodness of fit
The Linear Regression Family

Linear regression-based technical indicators include: Linear Regression Curve (LRC), Linear Regression Slope (LR Slope), Linear Regression Channel (LR Channel), Standard Error Bands (SE Bands), and R-Squared. This article focuses on LRC.

II. Mathematical Principles and Calculation

Ordinary Least Squares

Given a rolling window of length nn with price sequence {y1,y2,,yn}\{y_1, y_2, \dots, y_n\} and corresponding independent variable (time index) {x1,x2,,xn}\{x_1, x_2, \dots, x_n\} (typically xi=ix_i = i).

The linear regression model is:

y^=a+bx\hat{y} = a + b \cdot x

The least squares solution for slope bb and intercept aa:

b=ni=1nxiyii=1nxii=1nyini=1nxi2(i=1nxi)2b = \frac{n\sum_{i=1}^{n}x_i y_i - \sum_{i=1}^{n}x_i \sum_{i=1}^{n}y_i}{n\sum_{i=1}^{n}x_i^2 - \left(\sum_{i=1}^{n}x_i\right)^2}

a=yˉbxˉa = \bar{y} - b\bar{x}

Where xˉ=1nxi\bar{x} = \frac{1}{n}\sum x_i and yˉ=1nyi\bar{y} = \frac{1}{n}\sum y_i.

Linear Regression Curve Value

The LRC output at time tt is the regression line’s value at the window’s endpoint:

LRCt=a+bnLRC_t = a + b \cdot n

That is, after fitting a line on the window [tn+1, t][t-n+1, \ t], take the value at x=nx = n.

Simplified Formulas

Since xi=1,2,,nx_i = 1, 2, \dots, n, we can pre-compute:

i=1nxi=n(n+1)2,i=1nxi2=n(n+1)(2n+1)6\sum_{i=1}^{n} x_i = \frac{n(n+1)}{2}, \quad \sum_{i=1}^{n} x_i^2 = \frac{n(n+1)(2n+1)}{6}

The slope can therefore be simplified to:

b=ni=1niyin(n+1)2i=1nyinn(n+1)(2n+1)6(n(n+1)2)2b = \frac{n\sum_{i=1}^{n}i \cdot y_i - \frac{n(n+1)}{2}\sum_{i=1}^{n}y_i}{n \cdot \frac{n(n+1)(2n+1)}{6} - \left(\frac{n(n+1)}{2}\right)^2}

Significance of the Slope

The slope bb directly reflects trend direction and strength:

  • b>0b > 0: Price exhibits an uptrend within the window
  • b<0b < 0: Price exhibits a downtrend within the window
  • The larger b|b|, the steeper the trend

Calculation Steps

  1. Set the regression window length nn (default 20)
  2. For each time point tt, take the closing prices in the interval [tn+1,t][t-n+1, t]
  3. Fit y=a+bxy = a + bx using least squares
  4. Output LRCt=a+bnLRC_t = a + b \cdot n (the fitted value at the window’s endpoint)
  5. Optionally output the slope bb and R-Squared for supplementary analysis

III. Python Implementation

import numpy as np
import pandas as pd

def linear_regression_curve(df: pd.DataFrame,
                            period: int = 20,
                            column: str = 'close') -> pd.DataFrame:
    """
    Calculate the Linear Regression Curve (LRC) and related statistics.

    Parameters
    ----------
    df : DataFrame, must contain the specified price column
    period : Regression window length, default 20
    column : Column name for regression, default 'close'

    Returns
    ----------
    DataFrame with 'lrc' (regression value), 'slope', 'r_squared' (R^2) columns
    """
    prices = df[column].values
    n = len(prices)

    lrc = np.full(n, np.nan)
    slope = np.full(n, np.nan)
    intercept = np.full(n, np.nan)
    r_squared = np.full(n, np.nan)

    # Pre-compute x-related constants
    x = np.arange(1, period + 1, dtype=float)
    sum_x = x.sum()
    sum_x2 = (x ** 2).sum()
    mean_x = x.mean()

    for i in range(period - 1, n):
        y = prices[i - period + 1: i + 1]
        mean_y = y.mean()
        sum_xy = np.sum(x * y)
        sum_y = y.sum()

        # Slope
        denominator = period * sum_x2 - sum_x ** 2
        b = (period * sum_xy - sum_x * sum_y) / denominator
        a = mean_y - b * mean_x

        # LRC value (fitted value at window endpoint)
        lrc[i] = a + b * period

        slope[i] = b
        intercept[i] = a

        # R-Squared
        y_hat = a + b * x
        ss_res = np.sum((y - y_hat) ** 2)
        ss_tot = np.sum((y - mean_y) ** 2)
        if ss_tot > 0:
            r_squared[i] = 1 - ss_res / ss_tot
        else:
            r_squared[i] = 1.0  # All values are identical

    result = pd.DataFrame({
        'lrc': lrc,
        'slope': slope,
        'intercept': intercept,
        'r_squared': r_squared
    }, index=df.index)

    return result


def linear_regression_curve_vectorized(df: pd.DataFrame,
                                       period: int = 20,
                                       column: str = 'close') -> pd.Series:
    """
    Vectorized version using pandas rolling + apply.
    Returns only the LRC values, suitable for fast computation.
    """
    x = np.arange(1, period + 1, dtype=float)
    sum_x = x.sum()
    sum_x2 = (x ** 2).sum()
    mean_x = x.mean()

    def _fit(y):
        sum_xy = np.sum(x * y)
        sum_y = y.sum()
        b = (period * sum_xy - sum_x * sum_y) / (period * sum_x2 - sum_x ** 2)
        a = y.mean() - b * mean_x
        return a + b * period

    return df[column].rolling(window=period).apply(_fit, raw=True)


# ========== Usage Example ==========
if __name__ == "__main__":
    np.random.seed(42)
    n = 100
    dates = pd.date_range('2024-01-01', periods=n, freq='B')

    # Simulate price data with trend
    trend = np.linspace(0, 20, n)
    cycle = 5 * np.sin(np.linspace(0, 6 * np.pi, n))
    noise = np.cumsum(np.random.randn(n) * 0.3)
    price = 100 + trend + cycle + noise

    df = pd.DataFrame({
        'open':  price + np.random.uniform(-0.5, 0.5, n),
        'high':  price + np.abs(np.random.randn(n)) * 1.2,
        'low':   price - np.abs(np.random.randn(n)) * 1.2,
        'close': price + np.random.uniform(-0.3, 0.3, n),
        'volume': np.random.randint(1000, 5000, n)
    }, index=dates)

    # Calculate LRC for multiple periods
    lrc_20 = linear_regression_curve(df, period=20)
    lrc_50 = linear_regression_curve(df, period=50)

    merged = df[['close']].copy()
    merged['lrc_20'] = lrc_20['lrc']
    merged['slope_20'] = lrc_20['slope']
    merged['r_sq_20'] = lrc_20['r_squared']
    merged['lrc_50'] = lrc_50['lrc']

    print("Linear Regression Curve data for the last 15 periods:")
    print(merged.tail(15).to_string())

    # Trend assessment example
    latest = merged.dropna().iloc[-1]
    if latest['slope_20'] > 0 and latest['r_sq_20'] > 0.7:
        print(f"\nCurrently in a strong uptrend (slope={latest['slope_20']:.4f}, R2={latest['r_sq_20']:.3f})")
    elif latest['slope_20'] < 0 and latest['r_sq_20'] > 0.7:
        print(f"\nCurrently in a strong downtrend (slope={latest['slope_20']:.4f}, R2={latest['r_sq_20']:.3f})")
    else:
        print(f"\nNo clear trend (slope={latest['slope_20']:.4f}, R2={latest['r_sq_20']:.3f})")

    # Compare lag with SMA
    merged['sma_20'] = df['close'].rolling(20).mean()
    merged['lrc_minus_sma'] = merged['lrc_20'] - merged['sma_20']
    print(f"\nAverage difference between LRC and SMA: {merged['lrc_minus_sma'].dropna().mean():.4f}")
    print("(Positive means LRC is above SMA, indicating LRC tracks price more closely in uptrends)")
Performance Note

The window-by-window loop version has O(n×period)O(n \times period) complexity, which may be slow for large datasets or short-period backtests. The vectorized version using pandas rolling is slightly faster but fundamentally remains O(n×period)O(n \times period). For higher performance, consider using Numba or C extensions.

IV. Problems the Indicator Solves

1. Reducing Moving Average Lag

A standard SMA has approximately (period1)/2(period-1)/2 periods of lag. LRC fits a straight line and takes the endpoint value, tracking the current price more closely in trends and effectively reducing lag.

IndicatorPosition in UptrendPosition in Downtrend
SMARelatively far below priceRelatively far above price
LRCClose to price or slightly aheadClose to price or slightly ahead

2. Trend Direction and Strength Quantification

  • Slope b>0b > 0: Uptrend; the larger bb, the steeper the rise
  • Slope b<0b < 0: Downtrend; the larger b|b|, the steeper the decline
  • Slope near 0: Sideways consolidation

3. Trend Reliability Assessment

R2R^2 (coefficient of determination) measures the quality of the linear fit:

  • R2>0.8R^2 > 0.8: Highly linear price movement, strong and reliable trend
  • R2[0.5,0.8]R^2 \in [0.5, 0.8]: Some trend present, but with significant fluctuations
  • R2<0.5R^2 < 0.5: No clear linear trend, market is in a ranging state

4. Crossover Signals

  • Price crosses above LRC: Potential buy signal
  • Price crosses below LRC: Potential sell signal
  • Short-period LRC crosses above long-period LRC: Trend turns bullish
Filtering with R-Squared

Only trust LRC crossover signals when R2>0.6R^2 > 0.6 to significantly reduce false signals during choppy markets.

V. Advantages, Disadvantages, and Use Cases

Advantages

  1. Minimal lag: Among all moving average-type indicators, LRC has the least lag
  2. Solid mathematical foundation: Based on the classic least squares method with strong theoretical support
  3. Multi-dimensional information: Simultaneously provides trend value, slope, and goodness of fit (R^2)
  4. Excellent in trending markets: Reflects direction changes faster than SMA/EMA in clear trends
  5. Highly extensible: Naturally extends to linear regression channels, standard error bands, and other derivatives

Disadvantages

  1. Overfitting in ranging markets: In trendless markets, LRC fluctuates frequently with price, producing many false signals
  2. Sensitive to endpoint data: The last few data points in the window disproportionately influence the fit; a single outlier can significantly alter the slope
  3. Higher computational cost: Per-window regression is more expensive than SMA and EMA
  4. “Leading” may be “overshooting”: At trend endpoints, LRC may extrapolate excessively, generating more extreme signals than warranted

Use Cases

ScenarioSuitabilityNotes
Trend identification and confirmationHighSlope + R^2 combination assesses trend quality
Replacing traditional MA as a signal lineHighReduces lag-induced delayed entries
Mean reversion strategiesMediumPrice reverts when deviating too far from LRC
High-frequency / tick dataMediumBeware of computational performance
Pure ranging strategiesLowLRC signals are unstable in sideways markets

Comparison with Similar Indicators

DimensionLRC/LSMASMAEMAHMA
LagLowestHighestMediumLow
SmoothnessMediumHighMediumMedium
Endpoint SensitivityHighLowMediumMedium-High
Computational ComplexityHighLowLowMedium
Additional InformationSlope, R^2NoneNoneNone
Best Practices
  • Use LRC + R-Squared combination: Only trust trend signals when R^2 is sufficiently high
  • Combine with a linear regression channel (LRC +/- k standard errors) to build a complete trend trading system
  • Short-period LRC (e.g., 10 periods) captures short swings; long-period LRC (e.g., 50 periods) confirms the larger trend
  • Monitor the rate of change in slope: Slope turning from positive to negative is an early signal of trend weakening