How We Test
hvilkenAI.no tests AI models on practical, everyday tasks — not academic benchmarks.
Independent and Unbiased
hvilkenAI has no affiliate agreements, sponsors, or commercial partnerships with the AI providers we test. We receive no commission, discounts, or benefits from any model provider. All recommendations are based solely on test results. We are funded by subscription revenue from Pro users and advertising — never by the providers we evaluate.
Our Philosophy
What We Measure
Norwegian Language Quality (0–5)
How well does the model understand and write Norwegian Bokmal? Did it respond in Norwegian, or did it fall back to English?
Instruction Following (0–5)
Does the model do what you actually ask for? Correct length, format, and content matter.
Speed (tokens/second)
How quickly do you get a response? We measure tokens per second and time to first token (TTFT).
Price (kr per million tokens)
What does it cost in Norwegian kroner? Updated daily based on exchange rates.
Overall Score (0–10)
Weighted total assessment combining Norwegian, instruction, speed, and value per krone.
Orchestrator Score (0–10) — unique to hvilkenAI.no
How well suited is the model to control other AI models in Norwegian? Calculated from Norwegian × instruction — multiplication penalises weakness in both dimensions. A model that doesn’t write Norwegian cannot orchestrate effectively in Norwegian.
View orchestrator ranking →Model Selection and Test Frequency
Focus
Change Log — What We’ve Discovered
Real observations from daily benchmarking. This is what quarterly reports miss.
Why Daily Testing?
Most AI benchmarks are published monthly or quarterly. But AI models are updated continuously — often without the provider announcing it. A model that was best last week may have dropped to number 5 this week. Daily testing captures these changes in real time.
The AI market changes from day to day. Providers update their models without notice — we’ve caught several such "silent updates" because the score suddenly changed. A quarterly report doesn’t capture this. Daily testing does.
For businesses using AI in daily operations, this means your decision-making is always up to date. You don’t need to wait 3 months for the next report to know if you’re using the right model.
See Also