# Native Thinking

Set `native_thinking=True` on any `ChatConfig` to get the model's **actual internal
reasoning process** — not just a prompted instruction to "think step by step", but the
real tokens the model spends working through the problem before writing its answer.

This is different from `chain_of_thought=True` (which is a prompt engineering trick that
works on all models). Native thinking is a **model-level feature** supported only by
specific models, and the thinking text appears in dedicated fields separated from the
final answer.

## Supported models

| Provider | Models | Thinking visible? |
| --- | --- | --- |
| **Anthropic** | `claude-3-7-sonnet-20250219` and later | Yes — full text, streamable |
| **Google** | `gemini-2.5-pro`, `gemini-2.0-flash-thinking-exp` | Yes — full text, streamable |
| **OpenAI** | `o1`, `o3`, `o3-mini` | No — token count only (`usage["reasoning_tokens"]`) |

---

## Quick start — Anthropic (streaming)

```python
from ractogateway import anthropic_developer_kit as claude
from ractogateway.prompts.engine import RactoPrompt

prompt = RactoPrompt(
    role="Expert mathematician",
    aim="Solve the user's problem completely.",
    constraints=["Show the final answer clearly."],
    tone="Precise",
    output_format="text",
)
kit = claude.Chat(
    model="claude-3-7-sonnet-20250219",
    default_prompt=prompt,
)

print("=== THINKING ===")
for chunk in kit.stream(claude.ChatConfig(
    user_message="How many trailing zeros does 100! have?",
    native_thinking=True,
    thinking_budget=8000,
)):
    if chunk.is_thinking:
        print(chunk.delta.thinking, end="", flush=True)
    elif chunk.delta.text:
        if not chunk.accumulated_thinking:
            print("\n=== ANSWER ===")
        print(chunk.delta.text, end="", flush=True)
```

**Expected output:**

```
=== THINKING ===
I need to find the number of trailing zeros in 100!.
Trailing zeros come from factors of 10, and 10 = 2 × 5.
Since factors of 2 appear much more often than 5, I just need to count factors of 5.

Factors of 5 in 100!:
- Numbers divisible by 5:  100 ÷ 5  = 20
- Numbers divisible by 25: 100 ÷ 25 = 4
- Numbers divisible by 125: 100 ÷ 125 = 0 (125 > 100)

Total = 20 + 4 = 24

=== ANSWER ===
100! has **24 trailing zeros**.
```

---

## Quick start — Anthropic (non-streaming)

```python
response = kit.chat(claude.ChatConfig(
    user_message="How many trailing zeros does 100! have?",
    native_thinking=True,
    thinking_budget=8000,
))

print("THINKING:", response.thinking)
# THINKING: I need to find the number of trailing zeros in 100!.
# Trailing zeros come from factors of 10, and 10 = 2 × 5. ...

print("ANSWER:", response.content)
# ANSWER: 100! has **24 trailing zeros**.
```

---

## Quick start — Google Gemini (streaming)

```python
from ractogateway import google_developer_kit as gemini

kit = gemini.Chat(
    model="gemini-2.5-pro",
    default_prompt=prompt,
)

for chunk in kit.stream(gemini.ChatConfig(
    user_message="What is the probability of getting at least one 6 in four dice rolls?",
    native_thinking=True,
    thinking_budget=4096,
)):
    if chunk.is_thinking:
        print(chunk.delta.thinking, end="", flush=True)
    elif chunk.delta.text:
        print(chunk.delta.text, end="", flush=True)
```

**Expected output (thinking):**

```
The complement of "at least one 6" is "no 6 in any of the four rolls".
P(no 6 on a single roll) = 5/6
P(no 6 in four rolls)    = (5/6)^4 = 625/1296
P(at least one 6)        = 1 − 625/1296 = 671/1296 ≈ 0.5177
```

**Expected output (answer):**

```
The probability is **671/1296 ≈ 51.8%**.
```

---

## Quick start — OpenAI o-series (reasoning token count)

OpenAI o1/o3 models reason internally and do not expose the reasoning text.
`native_thinking=True` is accepted but has no API effect — the reasoning token count
is always added to `usage["reasoning_tokens"]` automatically when available.

```python
from ractogateway import openai_developer_kit as gpt

kit = gpt.Chat(model="o3-mini", default_prompt=prompt)

response = kit.chat(gpt.ChatConfig(
    user_message="Solve x² − 5x + 6 = 0",
    native_thinking=True,   # optional flag; no-op for OpenAI
))

print(response.content)
# x = 2  or  x = 3

print(response.usage)
# {
#   "prompt_tokens": 142,
#   "completion_tokens": 89,
#   "total_tokens": 231,
#   "reasoning_tokens": 64      ← reasoning tokens consumed
# }
```

---

## Controlling the thinking budget

```python
config = claude.ChatConfig(
    user_message="Explain why P ≠ NP is hard to prove.",
    native_thinking=True,
    thinking_budget=20000,   # more budget → deeper reasoning
    max_tokens=8192,         # must be > thinking_budget for Anthropic
)
```

| Parameter | Default | Notes |
| --- | --- | --- |
| `thinking_budget` | `10000` | Max tokens the model may spend reasoning |
| `max_tokens` | `4096` | Anthropic: must be set higher than the budget |

> **Anthropic note:** `temperature` is automatically forced to `1` — you do not need
> to set it. Passing any other value is silently overridden.

---

## Reading the output

### Streaming

```python
for chunk in kit.stream(config):
    # --- thinking phase ---
    chunk.is_thinking            # True while thinking tokens stream
    chunk.delta.thinking         # the new reasoning text in this event
    chunk.accumulated_thinking   # all reasoning text so far

    # --- answer phase ---
    chunk.delta.text             # the new answer text in this event
    chunk.accumulated_text       # all answer text so far

    # --- final chunk ---
    chunk.is_final               # True on the last event
    chunk.accumulated_thinking   # complete reasoning
    chunk.accumulated_text       # complete answer
    chunk.usage                  # token counts
```

### Non-streaming

```python
response = kit.chat(config)

response.thinking   # str | None  — complete reasoning text
response.content    # str | None  — final answer
response.usage      # dict with prompt_tokens, completion_tokens, total_tokens
                    # (+ reasoning_tokens for OpenAI o-series)
```

---

## Combining with `chain_of_thought`

`native_thinking` and `chain_of_thought` can be used together.
`chain_of_thought` adds a prompt-level instruction; `native_thinking` activates the
model's internal engine. On Anthropic/Google the model will reason both internally
(native) and then also tend to be more explicit in its answer (prompted).

```python
config = claude.ChatConfig(
    user_message="...",
    native_thinking=True,
    chain_of_thought=True,
)
```

---

## Practical: render thinking in a terminal

```python
import sys

previous_was_thinking = False

for chunk in kit.stream(config):
    if chunk.is_thinking:
        if not previous_was_thinking:
            print("\033[2m[thinking]\033[0m", flush=True)   # dim
        print(f"\033[2m{chunk.delta.thinking}\033[0m", end="", flush=True)
        previous_was_thinking = True
    else:
        if previous_was_thinking:
            print("\n\033[0m[answer]\033[0m", flush=True)   # reset
            previous_was_thinking = False
        print(chunk.delta.text, end="", flush=True)

print()
```

---

## Tips

| Goal | Setting |
| --- | --- |
| Deep, multi-step proofs / code | Raise `thinking_budget` to `32000`–`100000` |
| Fast simple answers | Lower `thinking_budget` to `1024`–`2048` |
| Only show the final answer | Ignore `chunk.delta.thinking`; read `chunk.accumulated_text` |
| Log full reasoning for debugging | Save `chunk.accumulated_thinking` on `chunk.is_final` |
| Async streaming | Use `astream()` — same fields, same behaviour |