How far can a mid sized language model go if the real innovation moves from the backbone into the agent scaffold and tool stack?