sudacode/dotfiles

Fork 0

mirror of https://github.com/ksyasuda/dotfiles.git synced 2026-03-20 06:11:27 -07:00

Files

sudacode f9a530667e

update skills

2026-03-17 16:53:22 -07:00

18 KiB

Raw Blame History

GPT-5.4 prompting upgrade guide

Use this guide when prompts written for older models need to be adapted for GPT-5.4 during an upgrade. Start lean: keep the model-string change narrow, preserve the original task intent, and add only the smallest prompt changes needed to recover behavior.

Default upgrade posture

Start with model string only whenever the old prompt is already short, explicit, and task-bounded.
Move to model string + light prompt rewrite only when regressions appear in completeness, persistence, citation quality, verification, or verbosity.
Prefer one or two targeted prompt additions over a broad rewrite.
Treat reasoning effort as a last-mile knob. Start lower, then increase only after prompt-level fixes and evals.
Before increasing reasoning effort, first add a completeness contract, a verification loop, and tool persistence rules - depending on the usage case.
If the workflow clearly depends on implementation changes rather than prompt changes, treat it as blocked for prompt-only upgrade guidance.
Do not classify a case as blocked just because the workflow uses tools; block only if the upgrade requires changing tool definitions, wiring, or other implementation details.

Behavioral differences to account for

Current GPT-5.4 upgrade guidance suggests these strengths:

stronger personality and tone adherence, with less drift over long answers
better long-horizon and agentic workflow stamina
stronger spreadsheet, finance, and formatting tasks
more efficient tool selection and fewer unnecessary calls by default
stronger structured generation and classification reliability

The main places where prompt guidance still helps are:

retrieval-heavy workflows that need persistent tool use and explicit completeness
research and citation discipline
verification before irreversible or high-impact actions
terminal and tool workflow hygiene
defaults and implied follow-through
verbosity control for compact, information-dense answers

Start with the smallest set of instructions that preserves correctness. Add the prompt blocks below only for workflows that actually need them.

Prompt rewrite patterns

Older prompt pattern	GPT-5.4 adjustment	Why	Example addition
Long, repetitive instructions that compensate for weaker instruction following	Remove duplicate scaffolding and keep only the constraints that materially change behavior	GPT-5.4 usually needs less repeated steering	Replace repeated reminders with one concise rule plus a verification block
Fast assistant prompt with no verbosity control	Keep the prompt as-is first; add a verbosity clamp only if outputs become too long	Many GPT-4o or GPT-4.1 upgrades work with just a model-string swap	Add `output_verbosity_spec` only after a verbosity regression
Tool-heavy agent prompt that assumes the model will keep searching until complete	Add persistence and verification rules	GPT-5.4 may use fewer tool calls by default for efficiency	Add `tool_persistence_rules` and `verification_loop`
Tool-heavy workflow where later actions depend on earlier lookup or retrieval	Add prerequisite and missing-context rules before action steps	GPT-5.4 benefits from explicit dependency-aware routing when context is still thin	Add `dependency_checks` and `missing_context_gating`
Retrieval workflow with several independent lookups	Add selective parallelism guidance	GPT-5.4 is strong at parallel tool use, but should not parallelize dependent steps	Add `parallel_tool_calling`
Batch workflow prompt that often misses items	Add an explicit completeness contract	Item accounting benefits from direct instruction	Add `completeness_contract`
Research prompt that needs grounding and citation discipline	Add research, citation, and empty-result recovery blocks	Multi-pass retrieval is stronger when the model is told how to react to weak or empty search results	Add `research_mode`, `citation_rules`, and `empty_result_handling`; add `tool_persistence_rules` when retrieval tools are already in use
Coding or terminal prompt with shell misuse or early stop failures	Keep the same tool surface and add terminal hygiene and verification instructions	Tool-using coding workflows are not blocked just because tools exist; they usually need better prompt steering, not host rewiring	Add `terminal_tool_hygiene` and `verification_loop`, optionally `tool_persistence_rules`
Multi-agent or support-triage workflow with escalation or completeness requirements	Add one lightweight control block for persistence, completeness, or verification	GPT-5.4 can be more efficient by default, so multi-step support flows benefit from an explicit completion or verification contract	Add at least one of `tool_persistence_rules`, `completeness_contract`, or `verification_loop`

Prompt blocks

Use these selectively. Do not add all of them by default.

`output_verbosity_spec`

Use when:

the upgraded model gets too wordy
the host needs compact, information-dense answers
the workflow benefits from a short overview plus a checklist

<output_verbosity_spec>
- Default: 3-6 sentences or up to 6 bullets.
- If the user asked for a doc or report, use headings with short bullets.
- For multi-step tasks:
  - Start with 1 short overview paragraph.
  - Then provide a checklist with statuses: [done], [todo], or [blocked].
- Avoid repeating the user's request.
- Prefer compact, information-dense writing.
</output_verbosity_spec>

`default_follow_through_policy`

Use when:

the host expects the model to proceed on reversible, low-risk steps
the upgraded model becomes too conservative or asks for confirmation too often

<default_follow_through_policy>
- If the user's intent is clear and the next step is reversible and low-risk, proceed without asking permission.
- Only ask permission if the next step is:
  (a) irreversible,
  (b) has external side effects, or
  (c) requires missing sensitive information or a choice that materially changes outcomes.
- If proceeding, state what you did and what remains optional.
</default_follow_through_policy>

`instruction_priority`

Use when:

users often change task shape, format, or tone mid-conversation
the host needs an explicit override policy instead of relying on defaults

<instruction_priority>
- User instructions override default style, tone, formatting, and initiative preferences.
- Safety, honesty, privacy, and permission constraints do not yield.
- If a newer user instruction conflicts with an earlier one, follow the newer instruction.
- Preserve earlier instructions that do not conflict.
</instruction_priority>

`tool_persistence_rules`

Use when:

the workflow needs multiple retrieval or verification steps
the model starts stopping too early because it is trying to save tool calls

<tool_persistence_rules>
- Use tools whenever they materially improve correctness, completeness, or grounding.
- Do not stop early just to save tool calls.
- Keep calling tools until:
  (1) the task is complete, and
  (2) verification passes.
- If a tool returns empty or partial results, retry with a different strategy.
</tool_persistence_rules>

`dig_deeper_nudge`

Use when:

the model is too literal or stops at the first plausible answer
the task is safety- or accuracy-sensitive and needs a small initiative nudge before raising reasoning effort

<dig_deeper_nudge>
- Do not stop at the first plausible answer.
- Look for second-order issues, edge cases, and missing constraints.
- If the task is safety- or accuracy-critical, perform at least one verification step.
</dig_deeper_nudge>

`dependency_checks`

Use when:

later actions depend on prerequisite lookup, memory retrieval, or discovery steps
the model may be tempted to skip prerequisite work because the intended end state seems obvious

<dependency_checks>
- Before taking an action, check whether prerequisite discovery, lookup, or memory retrieval is required.
- Do not skip prerequisite steps just because the intended final action seems obvious.
- If a later step depends on the output of an earlier one, resolve that dependency first.
</dependency_checks>

`parallel_tool_calling`

Use when:

the workflow has multiple independent retrieval steps
wall-clock time matters but some steps still need sequencing

<parallel_tool_calling>
- When multiple retrieval or lookup steps are independent, prefer parallel tool calls to reduce wall-clock time.
- Do not parallelize steps with prerequisite dependencies or where one result determines the next action.
- After parallel retrieval, pause to synthesize before making more calls.
- Prefer selective parallelism: parallelize independent evidence gathering, not speculative or redundant tool use.
</parallel_tool_calling>

`completeness_contract`

Use when:

the task involves batches, lists, enumerations, or multiple deliverables
missing items are a common failure mode

<completeness_contract>
- Deliver all requested items.
- Maintain an itemized checklist of deliverables.
- For lists or batches:
  - state the expected count,
  - enumerate items 1..N,
  - confirm that none are missing before finalizing.
- If any item is blocked by missing data, mark it [blocked] and state exactly what is missing.
</completeness_contract>

`empty_result_handling`

Use when:

the workflow frequently performs search, CRM, logs, or retrieval steps
no-results failures are often false negatives

<empty_result_handling>
If a lookup returns empty or suspiciously small results:
- Do not conclude that no results exist immediately.
- Try at least 2 fallback strategies, such as a broader query, alternate filters, or another source.
- Only then report that no results were found, along with what you tried.
</empty_result_handling>

`verification_loop`

Use when:

the workflow has downstream impact
accuracy, formatting, or completeness regressions matter

<verification_loop>
Before finalizing:
- Check correctness: does the output satisfy every requirement?
- Check grounding: are factual claims backed by retrieved sources or tool output?
- Check formatting: does the output match the requested schema or style?
- Check safety and irreversibility: if the next step has external side effects, ask permission first.
</verification_loop>

`missing_context_gating`

Use when:

required context is sometimes missing early in the workflow
the model should prefer retrieval over guessing

<missing_context_gating>
- If required context is missing, do not guess.
- Prefer the appropriate lookup tool when the context is retrievable; ask a minimal clarifying question only when it is not.
- If you must proceed, label assumptions explicitly and choose a reversible action.
</missing_context_gating>

`action_safety`

Use when:

the agent will actively take actions through tools
the host benefits from a short pre-flight and post-flight execution frame

<action_safety>
- Pre-flight: summarize the intended action and parameters in 1-2 lines.
- Execute via tool.
- Post-flight: confirm the outcome and any validation that was performed.
</action_safety>

`citation_rules`

Use when:

the workflow produces cited answers
fabricated citations or wrong citation formats are costly

<citation_rules>
- Only cite sources that were actually retrieved in this session.
- Never fabricate citations, URLs, IDs, or quote spans.
- If you cannot find a source for a claim, say so and either:
  - soften the claim, or
  - explain how to verify it with tools.
- Use exactly the citation format required by the host application.
</citation_rules>

`research_mode`

Use when:

the workflow is research-heavy
the host uses web search or retrieval tools

<research_mode>
- Do research in 3 passes:
  1) Plan: list 3-6 sub-questions to answer.
  2) Retrieve: search each sub-question and follow 1-2 second-order leads.
  3) Synthesize: resolve contradictions and write the final answer with citations.
- Stop only when more searching is unlikely to change the conclusion.
</research_mode>

If your host environment uses a specific research tool or requires a submit step, combine this with the host's finalization contract.

`structured_output_contract`

Use when:

the host depends on strict JSON, SQL, or other structured output

<structured_output_contract>
- Output only the requested format.
- Do not add prose or markdown fences unless they were requested.
- Validate that parentheses and brackets are balanced.
- Do not invent tables or fields.
- If required schema information is missing, ask for it or return an explicit error object.
</structured_output_contract>

`bbox_extraction_spec`

Use when:

the workflow extracts OCR boxes, document regions, or other coordinates
layout drift or missed dense regions are common failure modes

<bbox_extraction_spec>
- Use the specified coordinate format exactly, such as [x1,y1,x2,y2] normalized to 0..1.
- For each box, include page, label, text snippet, and confidence.
- Add a vertical-drift sanity check so boxes stay aligned with the correct line of text.
- If the layout is dense, process page by page and do a second pass for missed items.
</bbox_extraction_spec>

`terminal_tool_hygiene`

Use when:

the prompt belongs to a terminal-based or coding-agent workflow
tool misuse or shell misuse has been observed

<terminal_tool_hygiene>
- Only run shell commands through the terminal tool.
- Never try to "run" tool names as shell commands.
- If a patch or edit tool exists, use it directly instead of emulating it in bash.
- After changes, run a lightweight verification step such as ls, tests, or a build before declaring the task done.
</terminal_tool_hygiene>

`user_updates_spec`

Use when:

the workflow is long-running and user updates matter

<user_updates_spec>
- Only update the user when starting a new major phase or when the plan changes.
- Each update should contain:
  - 1 sentence on what changed,
  - 1 sentence on the next step.
- Do not narrate routine tool calls.
- Keep the user-facing update short, even when the actual work is exhaustive.
</user_updates_spec>

If you are using Compaction in the Responses API, compact after major milestones, treat compacted items as opaque state, and keep prompts functionally identical after compaction.

Responses `phase` guidance

For long-running Responses workflows, preambles, or tool-heavy agents that replay assistant items, review whether phase is already preserved.

If the host already round-trips phase, keep it intact during the upgrade.
If the host uses previous_response_id and does not manually replay assistant items, note that this may reduce manual phase handling needs.
If reliable GPT-5.4 behavior would require adding or preserving phase and that would need code edits, treat the case as blocked for prompt-only or model-string-only migration guidance.

Example upgrade profiles

GPT-5.2

Use gpt-5.4
Match the current reasoning effort first
Preserve the existing latency and quality profile before tuning prompt blocks
If the repo does not expose the exact setting, emit same as the starting recommendation

GPT-5.3-Codex

Use gpt-5.4
Match the current reasoning effort first
If you need Codex-style speed and efficiency, add verification blocks before increasing reasoning effort
If the repo does not expose the exact setting, emit same as the starting recommendation

GPT-4o or GPT-4.1 assistant

Use gpt-5.4
Start with none reasoning effort
Add output_verbosity_spec only if output becomes too verbose

Long-horizon agent

Use gpt-5.4
Start with medium reasoning effort
Add tool_persistence_rules
Add completeness_contract
Add verification_loop

Research workflow

Use gpt-5.4
Start with medium reasoning effort
Add research_mode
Add citation_rules
Add empty_result_handling
Add tool_persistence_rules when the host already uses web or retrieval tools
Add parallel_tool_calling when the retrieval steps are independent

Support triage or multi-agent workflow

Use gpt-5.4
Prefer model string + light prompt rewrite over model string only
Add at least one of tool_persistence_rules, completeness_contract, or verification_loop
Add more only if evals show a real regression

Coding or terminal workflow

Use gpt-5.4
Keep the model-string change narrow
Match the current reasoning effort first if you are upgrading from GPT-5.3-Codex
Add terminal_tool_hygiene
Add verification_loop
Add dependency_checks when actions depend on prerequisite lookup or discovery
Add tool_persistence_rules if the agent stops too early
Review whether phase is already preserved for long-running Responses flows or assistant preambles
Do not classify this as blocked just because the workflow uses tools; block only if the upgrade requires changing tool definitions or wiring
If the repo already uses Responses plus tools and no required host-side change is shown, prefer model_string_plus_light_prompt_rewrite over blocked

Prompt regression checklist

Check whether the upgraded prompt still preserves the original task intent.
Check whether the new prompt is leaner, not just longer.
Check completeness, citation quality, dependency handling, verification behavior, and verbosity.
For long-running Responses agents, check whether phase handling is already in place or needs implementation work.
Confirm that each added prompt block addresses an observed regression.
Remove prompt blocks that are not earning their keep.

18 KiB Raw Blame History

GPT-5.4 prompting upgrade guide

Default upgrade posture

Behavioral differences to account for

Prompt rewrite patterns

Prompt blocks

output_verbosity_spec

default_follow_through_policy

instruction_priority

tool_persistence_rules

dig_deeper_nudge

dependency_checks

parallel_tool_calling

completeness_contract

empty_result_handling

verification_loop

missing_context_gating

action_safety

citation_rules

research_mode

structured_output_contract

bbox_extraction_spec

terminal_tool_hygiene

user_updates_spec

Responses phase guidance

Example upgrade profiles

GPT-5.2

GPT-5.3-Codex

GPT-4o or GPT-4.1 assistant

Long-horizon agent

Research workflow

Support triage or multi-agent workflow

Coding or terminal workflow

Prompt regression checklist

18 KiB

Raw Blame History

`output_verbosity_spec`

`default_follow_through_policy`

`instruction_priority`

`tool_persistence_rules`

`dig_deeper_nudge`

`dependency_checks`

`parallel_tool_calling`

`completeness_contract`

`empty_result_handling`

`verification_loop`

`missing_context_gating`

`action_safety`

`citation_rules`

`research_mode`

`structured_output_contract`

`bbox_extraction_spec`

`terminal_tool_hygiene`

`user_updates_spec`

Responses `phase` guidance