ractogateway.finetune.dataset
Training dataset primitives for multimodal LLM fine-tuning.
Classes
- RactoTrainingMessage
One turn in a training conversation (role + text + optional file attachments).
- RactoTrainingExample
A complete multi-turn conversation used as a single training record.
- RactoDataset
Ordered collection of examples with validation, splitting, and JSONL export.
- class ractogateway.finetune.dataset.RactoTrainingMessage(role, content, attachments=<factory>)[source]
Bases:
objectOne conversational turn inside a training example.
- Parameters:
- role: Literal['system', 'user', 'assistant']
- content: str
- attachments: list[RactoFile]
- to_openai()[source]
Return an OpenAI-compatible message dict.
Text-only messages produce
{"role": ..., "content": str}. Messages with attachments produce a content-block list:{"role": ..., "content": [image_url_block, ..., text_block]}.
- to_anthropic()[source]
Return an Anthropic-compatible message dict.
System messages should be lifted to the top-level
systemfield —RactoTrainingExample.to_anthropic_dict()handles this automatically.
- class ractogateway.finetune.dataset.RactoTrainingExample(messages)[source]
Bases:
objectA complete conversation used as one training record.
- Parameters:
messages (
list[RactoTrainingMessage]) –Ordered turns. Typical shapes:
Single-turn :
[user, assistant]With system :
[system, user, assistant]Multi-turn :
[system, user, assistant, user, assistant, …]
Examples
>>> ex = RactoTrainingExample.from_pair( ... user="What is 2 + 2?", ... assistant="4", ... system="You are a maths tutor.", ... )
>>> # Multimodal example (image + question) >>> ex = RactoTrainingExample.from_pair( ... user="Describe this chart.", ... assistant="The chart shows monthly revenue for Q4 2024.", ... user_attachments=[RactoFile.from_path("chart.png")], ... )
- classmethod from_pair(user, assistant, *, system='', user_attachments=None)[source]
Create a single-turn (prompt → completion) training example.
- classmethod from_conversation(turns)[source]
Build from a list of
(role, content)tuples.
- to_openai_dict()[source]
Serialize to OpenAI fine-tuning JSONL record.
Output format:
{"messages": [{"role": "system", "content": "…"}, …]}
- to_anthropic_dict()[source]
Serialize to Anthropic fine-tuning JSONL record.
Output format:
{"system": "…", "messages": [{"role": "user", …}, …]}The
systemkey is only present when a system message exists.
- to_gemini_dict()[source]
Serialize to Gemini tuning record.
For text-only single-turn examples (most common) the output is:
{"text_input": "…", "output": "…"}
For multimodal or multi-turn examples the Vertex AI
contentsformat is used:{"contents": [{"role": "user", "parts": […]}, …]}
- class ractogateway.finetune.dataset.RactoDataset(examples=None)[source]
Bases:
objectAn ordered collection of
RactoTrainingExampleobjects.This is the central data container for building, validating, splitting, and exporting fine-tuning datasets for any supported LLM provider.
- Parameters:
examples (
list[RactoTrainingExample] |None) – Initial examples. An empty dataset is created when omitted.
Examples
Build from (user, assistant) pairs:
ds = RactoDataset.from_pairs( [ ("What is Python?", "Python is a high-level programming language."), ("What is a list?", "A list is a mutable ordered sequence."), ], system="You are a Python tutor.", )
Add multimodal examples manually:
ds.add( RactoTrainingExample.from_pair( user="Describe this image.", assistant="The image shows a flowchart with three decision nodes.", user_attachments=[RactoFile.from_path("diagram.png")], ) )
Export to JSONL for fine-tuning:
train_ds, val_ds = ds.split(0.8, seed=42) train_ds.export_jsonl("train.jsonl", provider="openai") val_ds.export_jsonl("val.jsonl", provider="openai")
- classmethod from_pairs(pairs, *, system='')[source]
Build a text-only dataset from
(user, assistant)pairs.
- classmethod from_jsonl(path, provider='openai')[source]
Load a JSONL dataset previously exported for provider.
Supports text-only OpenAI, Anthropic, and Gemini formats.
- Parameters:
- Return type:
- shuffle(seed=None)[source]
Return a new dataset with examples in random order.
- split(train_ratio=0.8, *, seed=None)[source]
Split into train and validation datasets.
- Parameters:
- Return type:
- Returns:
tuple[RactoDataset, RactoDataset] –
(train_dataset, validation_dataset)
- validate(provider='openai')[source]
Check examples for common formatting errors.
- to_jsonl_string(provider='openai')[source]
Serialize all examples to a JSONL string for provider.
- export_jsonl(path, provider='openai', *, overwrite=False)[source]
Write the dataset to a
.jsonlfile on disk.