Supervised Fine-Tuning (SFT)

The Supervised Fine-Tuning (SFT) phase is the first stage of the Signal.Engine "Brain" development. Before the agent is exposed to the complexities of Reinforcement Learning (RL), it undergoes SFT to acquire "Market Common Sense."

In this phase, the LSTM model learns to identify high-probability trend reversals and continuations by observing a Golden Dataset—a collection of historical market data where the "perfect" actions have been labeled using hindsight.

The "Golden Dataset" & ZigZag Labeling

The system uses src/data_labeler.py to process historical Nifty 500 data. It employs a ZigZag Algorithm to look back from the future and identify local minima and maxima.

Logic: If the price eventually rises by $X%$ without dropping by $Y%$ first, the current window is labeled as BUY (UP).
Purpose: This eliminates the "noise" of random price action and teaches the model to recognize the specific sequence of 50 candles that typically precedes a significant move.

LSTM Architecture

The model defined in src/lstm_model.py (the LSTMPredictor) is a multi-layer Long Short-Term Memory network designed to process time-series sequences rather than static snapshots.

Running SFT Training

To initialize the model and begin the supervised learning process, use the train_sft.py script. This will generate the foundational weights used later by the RL agent.

# Generate the labels and train the base LSTM model
python -m src.train_sft --epochs 50 --batch_size 32

Upon completion, the model weights are saved to checkpoints_sft/final_sft_model.pth.

Key Performance Indicators (KPIs)

During SFT, the objective is not profit, but Pattern Recognition Accuracy.

Target Accuracy: ~77% on the validation set.
Validation: The model is tested on unseen symbols from the Nifty 500 to ensure the "common sense" generalizes across different sectors and volatility regimes.
Convergence: Training utilizes a ReduceLROnPlateau scheduler, which automatically lowers the learning rate when validation loss stagnates, ensuring precise weight optimization.

Transitioning to RL

Once the SFT model achieves satisfactory accuracy, it serves as the Initial Policy for the Reinforcement Learning phase (train_ppo_optimized.py).

By starting with a pre-trained SFT model, the RL agent does not have to learn what a "trend" is from scratch via trial-and-error; it instead focuses on higher-level tasks like Risk Management, Position Sizing, and Execution Timing.

Note: While SFT provides high predictive accuracy, it is not used in isolation for live trading. It lacks the risk-awareness provided by the PPO (Reinforcement Learning) layer.