Next-Day Direction Prediction for SPY (Random Forest)

This project is my “first principles” attempt at a classic, high-friction question: can we predict the S&P 500’s next-day move (up/down) using only historical market data and engineered features? I used SPY as the proxy for broad U.S. equity conditions and framed the problem as binary classification (up vs. down), not price regression.

It’s important to be explicit about scope: this is an exploration of signal extraction + evaluation discipline (features, thresholds, rolling backtests), not a claim of market-beating certainty.

Repo: https://github.com/m-turnergane/stock-predictor-using-randomforest

Why SPY, and why “direction” instead of price?

SPY is liquid, widely followed, and acts as a practical stand-in for the index behavior most people actually interact with.

Predicting the exact next close is usually the wrong target for early experiments. Direction is the simplest “unit test” for:

whether features encode anything non-trivial
whether evaluation methodology is honest
whether a model can produce a selective signal (trade rarely, but with higher confidence)

Model framing

Target: whether tomorrow's close is higher than today's close (1 = up, 0 = down).
Model: RandomForestClassifier
Decisioning: predictions come from predict_proba, then a threshold converts probability → trade/no-trade signal.

That last piece matters: in markets, how you threshold is often more important than the model choice.

Random Forest Classifier Architecture Ensemble of decision trees: each tree votes on up/down, final prediction aggregates across all trees

Data ingestion and persistence

The pipeline starts by downloading SPY’s full available history via yfinance, then caching it locally (e.g., spy.json) to avoid unnecessary repeated pulls.

SPY Since Inception:

SPY close price since inception (sets context immediately and looks great on-page)

SPY since inception

Feature engineering philosophy

The feature set is intentionally "classical technical" and time-local:

1) Rolling context (means and ratios)

weekly / quarterly / annual rolling means
ratios between these means and the current close
ratios between timeframes (e.g., annual vs weekly)

This gives the forest a way to learn regimes like "price extended vs mean," "trend strength," and "relative momentum."

2) Intraday structure

open/close, high/close, low/close ratios

This approximates daily range, volatility, and position within the candle.

3) Simple momentum proxy

weekly trend: count of up days in the last week

4) Prior-day values

Joining shifted OHLCV values (*_prev) helps the model reason about change without you explicitly building deltas.

Screenshot of the Dataset values:

Shifted OHLCV (prev day join) The shifted dataframe head showing prior-day values joined for temporal context

Target column created Target column successfully appended: binary classification of next-day direction

Backtesting that respects time

Rather than a random train/test split, I used a rolling-window backtest:

Train on an expanding (or rolling) historical window
Test on the next block
Move forward through time

This is the minimum viable standard for time series ML. It won't eliminate all pitfalls, but it prevents the most common one: leaking the future into the past via shuffling.

Why precision (and why the trade count is low)

I evaluated primarily with precision, not accuracy.

In trading-like settings, especially when thresholding aggressively, you're effectively saying:

"I'd rather act rarely—but when I act, I want to be right more often."

With a higher threshold (e.g., 0.7), the model produces fewer "trade signals," which can inflate precision while lowering recall. That trade-off is not a bug—it's a design choice.

Results and interpretation

The tuning goal was to find a configuration where:

the model doesn't fire constantly (noise)
precision rises above "coin flip with confidence"
predictions are stable enough to inspect and iterate on

The write-up references a modest but meaningful precision improvement after tuning, and a relatively small number of predicted trade events over the full dataset window (i.e., selective signals rather than daily churn).

What matters most here is not a single number—it's whether the methodology produces a framework you can improve without fooling yourself.

Last 100 Predictions Model predictions vs actual direction over the last 100 data points

What I'd add next

If this were a v2 in the Trading Lab category, I'd add three things that deepen credibility immediately:

1. A threshold sweep chart

Precision / recall (or precision / trade count) as the threshold changes from 0.5 → 0.8.

This shows that your 0.7 choice is principled, not arbitrary.

Threshold Sweep Trade-off curve: as threshold increases, precision rises but trade count drops

2. A confusion matrix + class balance

Especially useful because markets are not perfectly balanced up/down across every regime.

Confusion Matrix Visual breakdown of true positives, false positives, true negatives, and false negatives

3. Transaction costs + slippage sensitivity

Even a simple "per-trade cost" assumption forces realism.

Optional but high-value:

Walk-forward parameter stability checks
Regime segmentation (e.g., pre/post-2008, pre/post-2020)
Add a small set of extra indicators (RSI/MACD) and compare incremental lift
Feature importance plot (already easy with RF)

Feature Importance Top 12 features ranked by their contribution to model predictions