Google recently introduced Gemma3-270M, a smaller Gemma3 model with "only" 270 million parameters instead of billions.
The most interesting aspect of this model to me is that it is explicitly intended to be able to run locally, without requiring highly specialized infrastructure — well within what is achievable outside of specialized datacenters. The potential to run the model with an air gap, isolating it from outside, would be interesting for some future stuff I'm working on.
The eventual uses would involve communication in the German language, so I decided to see about adding training to answer questions in German specifically. I referenced an existing colab notebook, which uses Gemma3-270M to predict chess moves. Chess as an application for LLMs isn't as interesting for me personally, we have better ways to use neural networks to play chess, but the training flow is the same.
We start by loading dependencies and instantiating the gemma-3-270m-it model.
%%capture import os if "COLAB_" not in "".join(os.environ.keys()): !pip install unsloth else: # Do this only in Colab notebooks! Otherwise use pip install unsloth !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft !pip install --no-deps trl triton cut_cross_entropy unsloth_zoo !pip install sentencepiece protobuf "datasets>=3.4.1,<4.0.0" "huggingface_hub>=0.34.0" hf_transfer !pip install --no-deps unsloth from unsloth import FastModel import torch max_seq_length = 2048 model, tokenizer = FastModel.from_pretrained( model_name = "unsloth/gemma-3-270m-it", max_seq_length = max_seq_length, # Choose any for long context! load_in_4bit = False, # 4 bit quantization to reduce memory load_in_8bit = False, # [NEW!] A bit more accurate, uses 2x memory full_finetuning = False, # [NEW!] We have full finetuning now! # token = "hf_...", # use one if using gated models )
We set it up to accept training data in a chat format using the Huggingface deepset/germanquad dataset, a curated set of training data from the Deutsch Wikipedia and various academic sources.
model = FastModel.get_peft_model( model, r = 128, target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj",], lora_alpha = 128, lora_dropout = 0, bias = "none", use_gradient_checkpointing = "unsloth", random_state = 3407, # Seems pretty random use_rslora = False, loftq_config = None, ) from unsloth.chat_templates import get_chat_template tokenizer = get_chat_template(tokenizer, chat_template = "gemma3") from datasets import load_dataset dataset = load_dataset("deepset/germanquad", split = "train[:10000]") def convert_to_chatml(example): return { "conversations": [ {"role": "system", "content": example["context"]}, {"role": "user", "content": example["question"]}, {"role": "assistant", "content": example["answers"]["text"][0]} ] } dataset = dataset.map(convert_to_chatml) def formatting_prompts_func(examples): convos = examples["conversations"] texts = [tokenizer.apply_chat_template(convo,tokenize = False, add_generation_prompt = False).removeprefix('<bos>') for convo in convos] return { "text" : texts, } dataset = dataset.map(formatting_prompts_func, batched = True) from trl import SFTTrainer, SFTConfig trainer = SFTTrainer( model = model, tokenizer = tokenizer, train_dataset = dataset, eval_dataset = None, args = SFTConfig( dataset_text_field = "text", per_device_train_batch_size = 8, gradient_accumulation_steps = 1, warmup_steps = 5, num_train_epochs = 1, max_steps = 100, learning_rate = 5e-5, logging_steps = 1, optim = "adamw_8bit", weight_decay = 0.01, lr_scheduler_type = "linear", seed = 3407, output_dir="outputs", report_to = "none", ), ) from unsloth.chat_templates import train_on_responses_only trainer = train_on_responses_only( trainer, instruction_part = "<start_of_turn>user\n", response_part = "<start_of_turn>model\n", )
We then train the model. This took about three minutes on Google Colab using a Tensor T4 system.
trainer_stats = trainer.train()
Now, the real test: can it give good answers to questions not in its training data?
messages = [ {'role': 'system','content': 'Bielefeld'}, {"role" : 'user', 'content' : 'Gibt es Bielefeld?'} ] text = tokenizer.apply_chat_template( messages, tokenize = False, add_generation_prompt = True, # Must add for generation ).removeprefix('<bos>') from transformers import TextStreamer _ = model.generate( **tokenizer(text, return_tensors = "pt").to("cuda"), max_new_tokens = 125, temperature = 1, top_p = 0.95, top_k = 64, streamer = TextStreamer(tokenizer, skip_prompt = True), ) <bos><start_of_turn>user Gibt es Bielefeld? <end_of_turn> <start_of_turn>model Ja <end_of_turn>
Indeed yes, it can!
If that interaction doesn't make much sense: it is a German joke, alleging that the city of Bielefeld doesn't actually exist. Wikipedia has an explanation in English.
The trained model says that Bielefeld does exist. Clearly it has no sense of humor.