Hybrid Brain Architecture

Hybrid Brain Architecture (v3.0)

The core of Signal.Engine is a multi-layered reasoning system known as the Hybrid Brain. Moving beyond simple binary classifiers, this architecture treats market analysis as a sequence-modeling problem, combining the predictive power of Deep Learning with the disciplined decision-making of Reinforcement Learning.

The "Brain" functions as a three-stage pipeline: Perception (LSTM), Reasoning (SFT), and Optimization (PPO).

1. Temporal Perception: Sequence Modeling (LSTM)

The foundation of the brain is a Long Short-Term Memory (LSTM) network designed to process time-series data as a continuous narrative rather than isolated snapshots.

Window Logic: The model ingests the last 50 candles (trading periods) to identify momentum, support/resistance levels, and volatility clusters.
Feature Set: It processes a multi-dimensional vector including price action (Close, Log_Return) and technical oscillators (RSI, MACD).
Internal Stability: Utilizes Batch Normalization and Dropout layers to maintain inference stability during high-volatility market regimes.

2. The Teacher: Supervised Fine-Tuning (SFT)

Before the agent is allowed to trade, it undergoes Supervised Fine-Tuning. This phase builds "Market Common Sense."

The Golden Dataset: The model is trained on hindsight-labeled data using a ZigZag Labeler. By looking at past charts, the teacher shows the model exactly where the "perfect" trades were.
Pattern Recognition: This phase achieves ~77% directional accuracy, teaching the agent to recognize classic trend reversals and breakout patterns.

3. The Strategist: Reinforcement Learning (PPO)

While the LSTM learns direction, the Proximal Policy Optimization (PPO) layer learns strategy. This is where the agent develops its "trading personality."

Risk-Adjusted Returns: The PPO agent is trained in a vectorized GPU environment where it is rewarded not just for profit, but for maintaining a high Sharpe Ratio and low Maximum Drawdown.
Action Space: The agent chooses between three discrete actions:
- 0: SELL / SHORT (Bearish conviction)
- 1: HOLD / NEUTRAL (Preserving capital during noise)
- 2: BUY / LONG (Bullish conviction)
Reward Shaping: The agent is penalized for over-trading (transaction costs) and rewarded for holding winning positions through trend extensions.

4. Heuristic & Risk Overlay

The final decision is filtered through a Heuristic Expert layer. This layer provides the "Rational" seen in the dashboard, ensuring the AI's "black box" decisions align with quantitative risk parameters.

Regime Detection: Monitors market mood (e.g., "VOLATILE" vs "CALM") to adjust position sizing.
Confidence Scoring: Every trade is assigned a confidence value (0.0 to 1.0). Trades below the configured threshold (typically 0.85) are automatically discarded.

Interacting with the Brain

As a developer, you interact with the Hybrid Brain primarily through the API or the specialized trading scripts.

Programmatic Access (Python)

To trigger the brain's reasoning process manually within the environment:

from src.brain.hybrid import HybridBrain

brain = HybridBrain()
# The .think() method runs the full LSTM + PPO inference pipeline
decisions = brain.think()

for trade in decisions:
    print(f"Ticker: {trade['Ticker']}")
    print(f"Action: {trade['Action']}")
    print(f"Confidence: {trade['Confidence']:.2%}")
    print(f"Rational: {', '.join(trade['Rational'])}")

API Access (REST)

The frontend communicates with the brain via the FastAPI backend. You can poll the latest "thoughts" from the brain using the following endpoint:

Endpoint: GET /api/results

Response Structure:

{
  "status": "success",
  "data": [
    {
      "Ticker": "RELIANCE.NS",
      "Action": "BUY",
      "Confidence": 0.92,
      "Rational": ["Strong Momentum", "Low Volatility Regime", "RSI Divergence"]
    }
  ],
  "is_thinking": false
}

Configuration & Checkpoints

The Brain’s behavior is determined by the weights stored in the checkpoints/ directory.

final_sft_model.pth: The "Knowledge Base" (Directional accuracy).
best_ppo.ckpt: The "Execution Logic" (Risk management and timing).

To update the brain's intelligence, replace these files with newly trained weights from train_ppo_optimized.py.