Models need a harness to become fully useful, and improving that harness means continuously iterating on context, evaluation, and model-specific tuning.