Hindsight Labeling (ZigZag)

Hindsight Labeling (ZigZag Logic)

The data_labeler.py module is the "Teacher" of the Signal.Engine ecosystem. Unlike traditional labeling that simply looks at the next day's price (which is often noisy), this system uses a ZigZag Hindsight Algorithm to identify structural market turns.

By looking at historical data, the labeler identifies what a perfect trader should have done at any given moment. These "Golden Labels" are then used to pre-train the LSTM model during the Supervised Fine-Tuning (SFT) phase.

The Concept: Structural Labeling

Standard labeling (e.g., Price_Tomorrow > Price_Today) results in a dataset filled with market noise. Signal.Engine uses hindsight to find significant peaks and troughs.

Peak/Trough Detection: The algorithm scans historical OHLCV data to find points where the price reversed by a significant percentage (the "Deviation Threshold").
Action Assignment:
- Buy (2): Assigned to periods leading up to a major peak.
- Sell (0): Assigned to periods leading down to a major trough.
- Hold (1): Assigned to sideways or low-conviction price action.

Public Interface & Usage

The primary entry point is data_labeler.py. It processes raw ticker data and generates a labeled CSV ready for model training.

Generating a Labeled Dataset

To generate labels for a specific ticker, run the module directly:

python -m src.data_labeler --symbol SBIN.NS --threshold 0.05

Arguments:

--symbol: The Yahoo Finance ticker (e.g., RELIANCE.NS, AAPL).
--threshold: The minimum percentage change required to define a new ZigZag trend (default: 0.03 or 3%).
--save_path: (Optional) Custom path for the resulting .csv or .pt file.

Data Output Format

The labeler produces a dataset where each timestamp is associated with a "Golden Action." This action represents the ground truth for what the agent should have predicted at that specific window in time.

| Timestamp | Close | RSI | ... | Golden_Label | | :--- | :--- | :--- | :--- | :--- | | 2023-01-01 | 2500.5 | 45.2 | ... | 2 (Buy) | | 2023-01-02 | 2510.2 | 48.1 | ... | 2 (Buy) | | 2023-05-10 | 2800.0 | 72.0 | ... | 0 (Sell) |

Integration with SFT

Once labels are generated, they are consumed by sft_dataset.py to create sliding window sequences (default: 50 candles).

Teacher Forcing: During train_sft.py, the model is penalized if its prediction deviates from these ZigZag labels.
Common Sense: This ensures the agent learns basic "Common Sense" trend-following (e.g., "Don't buy at the very top of a 20% vertical move") before it ever enters the Reinforcement Learning environment.

Internal Configuration

While the logic is automated, the "sensitivity" of the hindsight can be adjusted in the code via the following parameters (found in src/data_labeler.py):

min_pivot_dist: The minimum number of candles between two structural pivots.
neutral_buffer: A volatility-based filter that converts low-confidence Buy/Sell signals into Neutral (1) to prevent the model from over-trading in flat markets.