Resources
Resources

Research
From experiments to insights: Orq.ai research hub
Filter by theme:
Large Language Models
Generative AI
Prompt Engineering
RAG-as-a-Service

System Prompt Placement for LLM-as-a-Judge
Every team building an LLM-as-a-judge pipeline faces the same question: does it matter where you put your instructions? We tested it across models to get a real answer.

Prompt Optimization: How to Make Smaller Models Punch Above Their Weight
We explored using prompt optimization to make cheaper, smaller language models perform as well as expensive, more capable ones. We achieved up to 4x performance improvements for trace classification, while learning important lessons about overfitting along the way.

Can a 14B Model Match a 100B+ Model? We Fine-Tuned 8+ Models to Find Out
Key takeways from fine-tuning 8+ language models on a text classification task, from tiny 0.6B models to 14B behemoths.
Filter by theme:
Large Language Models
Generative AI
Prompt Engineering
RAG-as-a-Service

System Prompt Placement for LLM-as-a-Judge
Every team building an LLM-as-a-judge pipeline faces the same question: does it matter where you put your instructions? We tested it across models to get a real answer.

Prompt Optimization: How to Make Smaller Models Punch Above Their Weight
We explored using prompt optimization to make cheaper, smaller language models perform as well as expensive, more capable ones. We achieved up to 4x performance improvements for trace classification, while learning important lessons about overfitting along the way.

Can a 14B Model Match a 100B+ Model? We Fine-Tuned 8+ Models to Find Out
Key takeways from fine-tuning 8+ language models on a text classification task, from tiny 0.6B models to 14B behemoths.
Filter by theme:
Large Language Models
Generative AI
Prompt Engineering
RAG-as-a-Service

System Prompt Placement for LLM-as-a-Judge
Every team building an LLM-as-a-judge pipeline faces the same question: does it matter where you put your instructions? We tested it across models to get a real answer.

Prompt Optimization: How to Make Smaller Models Punch Above Their Weight
We explored using prompt optimization to make cheaper, smaller language models perform as well as expensive, more capable ones. We achieved up to 4x performance improvements for trace classification, while learning important lessons about overfitting along the way.

Can a 14B Model Match a 100B+ Model? We Fine-Tuned 8+ Models to Find Out
Key takeways from fine-tuning 8+ language models on a text classification task, from tiny 0.6B models to 14B behemoths.
