ractogateway.finetune.openai_tuner

OpenAI fine-tuning adapter for RactoGateway.

Workflow

Build a RactoDataset.
Call OpenAIFineTuner.run_pipeline() for a one-shot end-to-end run, or call the lower-level methods individually:
1. upload_dataset() → file_id
2. create_job() → job_id
3. wait_for_completion() → fine_tuned_model

Supported base models (as of 2025)

gpt-4o-mini-2024-07-18 — recommended; cost-effective
gpt-4o-2024-08-06 — multimodal vision fine-tuning
gpt-3.5-turbo-0125 — legacy option

class ractogateway.finetune.openai_tuner.OpenAIFineTuner(api_key=None, *, base_url=None)[source]

Bases: object

Fine-tune OpenAI models using the fine-tuning API.

Parameters:

api_key (str | None) – OpenAI API key. Falls back to the OPENAI_API_KEY environment variable when not supplied.
base_url (str | None) – Optional custom base URL (Azure OpenAI, proxy, etc.).

Examples

End-to-end pipeline (simplest usage):

from ractogateway.finetune import RactoDataset, OpenAIFineTuner

ds = RactoDataset.from_pairs(
    [("What is Python?", "A high-level programming language.")],
    system="You are a Python tutor.",
)
tuner = OpenAIFineTuner()
model = tuner.run_pipeline(ds, model="gpt-4o-mini-2024-07-18")
print(model)   # "ft:gpt-4o-mini-2024-07-18:org::abc123"

upload_dataset(dataset)[source]

Upload dataset as an OpenAI training file.

Parameters:: dataset (RactoDataset) – The training examples to upload.
Return type:: str
Returns:: str – The OpenAI file ID (e.g. "file-abc123").

create_job(training_file, model='gpt-4o-mini-2024-07-18', *, validation_file=None, n_epochs='auto', batch_size='auto', learning_rate_multiplier='auto', suffix=None)[source]

Submit a fine-tuning job.

Parameters:

training_file (str) – File ID returned by upload_dataset().
model (str) – Base model to fine-tune.
validation_file (str | None) – Optional validation file ID (also produced by upload_dataset()).
n_epochs (int | str) – Training epochs.
batch_size (int | str) – Per-device batch size.
learning_rate_multiplier (float | str) – Scales the default learning rate.
suffix (str | None) – Custom label appended to the fine-tuned model name.

Return type:

str

Returns:

str – The fine-tuning job ID (e.g. "ftjob-abc123").

get_status(job_id)[source]

Retrieve the current status of a fine-tuning job.

Return type:: dict[str, Any]
Returns:: dict – Keys: id, status, model, fine_tuned_model, created_at, finished_at, trained_tokens, error.

list_jobs(limit=10)[source]

Return the most recent fine-tuning jobs (newest first).

Return type:: list[dict[str, Any]]

list_events(job_id, limit=20)[source]

Return recent training log events for a job.

Return type:: list[dict[str, Any]]

cancel_job(job_id)[source]

Cancel a running fine-tuning job.

Return type:: dict[str, Any]

wait_for_completion(job_id, *, poll_interval=30, verbose=True)[source]

Block until a fine-tuning job finishes.

Parameters:

job_id (str) – The job ID returned by create_job().
poll_interval (int) – Seconds between status-check API calls.
verbose (bool) – Print status lines to stdout.

Return type:

str

Returns:

str – The fine-tuned model name ready for use in OpenAILLMKit.

Raises:

RuntimeError – If the job ends in "failed" or "cancelled" state.

run_pipeline(dataset, model='gpt-4o-mini-2024-07-18', *, validation_dataset=None, n_epochs='auto', batch_size='auto', learning_rate_multiplier='auto', suffix=None, poll_interval=30, verbose=True)[source]

Validate → upload → train → wait in a single call.

This is the recommended entry-point for most use cases.

Parameters:

dataset (RactoDataset) – Training examples.
model (str) – Base model to fine-tune.
validation_dataset (RactoDataset | None) – Optional held-out validation set (uploaded separately).
n_epochs (int | str) – Training hyperparameters. Pass "auto" to let OpenAI decide.
batch_size (int | str) – Training hyperparameters. Pass "auto" to let OpenAI decide.
learning_rate_multiplier (float | str) – Training hyperparameters. Pass "auto" to let OpenAI decide.
suffix (str | None) – Short label appended to the fine-tuned model name.
poll_interval (int) – Seconds between status polls while waiting.
verbose (bool) – Print progress to stdout.

Return type:

str

Returns:

str – Fine-tuned model identifier — pass directly to OpenAIDeveloperKit(model=...):

kit = opd.OpenAIDeveloperKit(model=fine_tuned_model)

Raises:

ValueError – If dataset validation fails.
RuntimeError – If the fine-tuning job fails remotely.