This commit is contained in:
2026-02-19 00:33:08 -08:00
parent e37f3dd7b1
commit 70dd0779f2
143 changed files with 31888 additions and 0 deletions

View File

@@ -0,0 +1,96 @@
---
name: brainstorming
description: "You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation."
---
# Brainstorming Ideas Into Designs
## Overview
Help turn ideas into fully formed designs and specs through natural collaborative dialogue.
Start by understanding the current project context, then ask questions one at a time to refine the idea. Once you understand what you're building, present the design and get user approval.
<HARD-GATE>
Do NOT invoke any implementation skill, write any code, scaffold any project, or take any implementation action until you have presented a design and the user has approved it. This applies to EVERY project regardless of perceived simplicity.
</HARD-GATE>
## Anti-Pattern: "This Is Too Simple To Need A Design"
Every project goes through this process. A todo list, a single-function utility, a config change — all of them. "Simple" projects are where unexamined assumptions cause the most wasted work. The design can be short (a few sentences for truly simple projects), but you MUST present it and get approval.
## Checklist
You MUST create a task for each of these items and complete them in order:
1. **Explore project context** — check files, docs, recent commits
2. **Ask clarifying questions** — one at a time, understand purpose/constraints/success criteria
3. **Propose 2-3 approaches** — with trade-offs and your recommendation
4. **Present design** — in sections scaled to their complexity, get user approval after each section
5. **Write design doc** — save to `docs/plans/YYYY-MM-DD-<topic>-design.md` and commit
6. **Transition to implementation** — invoke writing-plans skill to create implementation plan
## Process Flow
```dot
digraph brainstorming {
"Explore project context" [shape=box];
"Ask clarifying questions" [shape=box];
"Propose 2-3 approaches" [shape=box];
"Present design sections" [shape=box];
"User approves design?" [shape=diamond];
"Write design doc" [shape=box];
"Invoke writing-plans skill" [shape=doublecircle];
"Explore project context" -> "Ask clarifying questions";
"Ask clarifying questions" -> "Propose 2-3 approaches";
"Propose 2-3 approaches" -> "Present design sections";
"Present design sections" -> "User approves design?";
"User approves design?" -> "Present design sections" [label="no, revise"];
"User approves design?" -> "Write design doc" [label="yes"];
"Write design doc" -> "Invoke writing-plans skill";
}
```
**The terminal state is invoking writing-plans.** Do NOT invoke frontend-design, mcp-builder, or any other implementation skill. The ONLY skill you invoke after brainstorming is writing-plans.
## The Process
**Understanding the idea:**
- Check out the current project state first (files, docs, recent commits)
- Ask questions one at a time to refine the idea
- Prefer multiple choice questions when possible, but open-ended is fine too
- Only one question per message - if a topic needs more exploration, break it into multiple questions
- Focus on understanding: purpose, constraints, success criteria
**Exploring approaches:**
- Propose 2-3 different approaches with trade-offs
- Present options conversationally with your recommendation and reasoning
- Lead with your recommended option and explain why
**Presenting the design:**
- Once you believe you understand what you're building, present the design
- Scale each section to its complexity: a few sentences if straightforward, up to 200-300 words if nuanced
- Ask after each section whether it looks right so far
- Cover: architecture, components, data flow, error handling, testing
- Be ready to go back and clarify if something doesn't make sense
## After the Design
**Documentation:**
- Write the validated design to `docs/plans/YYYY-MM-DD-<topic>-design.md`
- Use elements-of-style:writing-clearly-and-concisely skill if available
- Commit the design document to git
**Implementation:**
- Invoke the writing-plans skill to create a detailed implementation plan
- Do NOT invoke any other skill. writing-plans is the next step.
## Key Principles
- **One question at a time** - Don't overwhelm with multiple questions
- **Multiple choice preferred** - Easier to answer than open-ended when possible
- **YAGNI ruthlessly** - Remove unnecessary features from all designs
- **Explore alternatives** - Always propose 2-3 approaches before settling
- **Incremental validation** - Present design, get approval before moving on
- **Be flexible** - Go back and clarify when something doesn't make sense

View File

@@ -0,0 +1,180 @@
---
name: dispatching-parallel-agents
description: Use when facing 2+ independent tasks that can be worked on without shared state or sequential dependencies
---
# Dispatching Parallel Agents
## Overview
When you have multiple unrelated failures (different test files, different subsystems, different bugs), investigating them sequentially wastes time. Each investigation is independent and can happen in parallel.
**Core principle:** Dispatch one agent per independent problem domain. Let them work concurrently.
## When to Use
```dot
digraph when_to_use {
"Multiple failures?" [shape=diamond];
"Are they independent?" [shape=diamond];
"Single agent investigates all" [shape=box];
"One agent per problem domain" [shape=box];
"Can they work in parallel?" [shape=diamond];
"Sequential agents" [shape=box];
"Parallel dispatch" [shape=box];
"Multiple failures?" -> "Are they independent?" [label="yes"];
"Are they independent?" -> "Single agent investigates all" [label="no - related"];
"Are they independent?" -> "Can they work in parallel?" [label="yes"];
"Can they work in parallel?" -> "Parallel dispatch" [label="yes"];
"Can they work in parallel?" -> "Sequential agents" [label="no - shared state"];
}
```
**Use when:**
- 3+ test files failing with different root causes
- Multiple subsystems broken independently
- Each problem can be understood without context from others
- No shared state between investigations
**Don't use when:**
- Failures are related (fix one might fix others)
- Need to understand full system state
- Agents would interfere with each other
## The Pattern
### 1. Identify Independent Domains
Group failures by what's broken:
- File A tests: Tool approval flow
- File B tests: Batch completion behavior
- File C tests: Abort functionality
Each domain is independent - fixing tool approval doesn't affect abort tests.
### 2. Create Focused Agent Tasks
Each agent gets:
- **Specific scope:** One test file or subsystem
- **Clear goal:** Make these tests pass
- **Constraints:** Don't change other code
- **Expected output:** Summary of what you found and fixed
### 3. Dispatch in Parallel
```typescript
// In Claude Code / AI environment
Task("Fix agent-tool-abort.test.ts failures")
Task("Fix batch-completion-behavior.test.ts failures")
Task("Fix tool-approval-race-conditions.test.ts failures")
// All three run concurrently
```
### 4. Review and Integrate
When agents return:
- Read each summary
- Verify fixes don't conflict
- Run full test suite
- Integrate all changes
## Agent Prompt Structure
Good agent prompts are:
1. **Focused** - One clear problem domain
2. **Self-contained** - All context needed to understand the problem
3. **Specific about output** - What should the agent return?
```markdown
Fix the 3 failing tests in src/agents/agent-tool-abort.test.ts:
1. "should abort tool with partial output capture" - expects 'interrupted at' in message
2. "should handle mixed completed and aborted tools" - fast tool aborted instead of completed
3. "should properly track pendingToolCount" - expects 3 results but gets 0
These are timing/race condition issues. Your task:
1. Read the test file and understand what each test verifies
2. Identify root cause - timing issues or actual bugs?
3. Fix by:
- Replacing arbitrary timeouts with event-based waiting
- Fixing bugs in abort implementation if found
- Adjusting test expectations if testing changed behavior
Do NOT just increase timeouts - find the real issue.
Return: Summary of what you found and what you fixed.
```
## Common Mistakes
**❌ Too broad:** "Fix all the tests" - agent gets lost
**✅ Specific:** "Fix agent-tool-abort.test.ts" - focused scope
**❌ No context:** "Fix the race condition" - agent doesn't know where
**✅ Context:** Paste the error messages and test names
**❌ No constraints:** Agent might refactor everything
**✅ Constraints:** "Do NOT change production code" or "Fix tests only"
**❌ Vague output:** "Fix it" - you don't know what changed
**✅ Specific:** "Return summary of root cause and changes"
## When NOT to Use
**Related failures:** Fixing one might fix others - investigate together first
**Need full context:** Understanding requires seeing entire system
**Exploratory debugging:** You don't know what's broken yet
**Shared state:** Agents would interfere (editing same files, using same resources)
## Real Example from Session
**Scenario:** 6 test failures across 3 files after major refactoring
**Failures:**
- agent-tool-abort.test.ts: 3 failures (timing issues)
- batch-completion-behavior.test.ts: 2 failures (tools not executing)
- tool-approval-race-conditions.test.ts: 1 failure (execution count = 0)
**Decision:** Independent domains - abort logic separate from batch completion separate from race conditions
**Dispatch:**
```
Agent 1 → Fix agent-tool-abort.test.ts
Agent 2 → Fix batch-completion-behavior.test.ts
Agent 3 → Fix tool-approval-race-conditions.test.ts
```
**Results:**
- Agent 1: Replaced timeouts with event-based waiting
- Agent 2: Fixed event structure bug (threadId in wrong place)
- Agent 3: Added wait for async tool execution to complete
**Integration:** All fixes independent, no conflicts, full suite green
**Time saved:** 3 problems solved in parallel vs sequentially
## Key Benefits
1. **Parallelization** - Multiple investigations happen simultaneously
2. **Focus** - Each agent has narrow scope, less context to track
3. **Independence** - Agents don't interfere with each other
4. **Speed** - 3 problems solved in time of 1
## Verification
After agents return:
1. **Review each summary** - Understand what changed
2. **Check for conflicts** - Did agents edit same code?
3. **Run full suite** - Verify all fixes work together
4. **Spot check** - Agents can make systematic errors
## Real-World Impact
From debugging session (2025-10-03):
- 6 failures across 3 files
- 3 agents dispatched in parallel
- All investigations completed concurrently
- All fixes integrated successfully
- Zero conflicts between agent changes

View File

@@ -0,0 +1,84 @@
---
name: executing-plans
description: Use when you have a written implementation plan to execute in a separate session with review checkpoints
---
# Executing Plans
## Overview
Load plan, review critically, execute tasks in batches, report for review between batches.
**Core principle:** Batch execution with checkpoints for architect review.
**Announce at start:** "I'm using the executing-plans skill to implement this plan."
## The Process
### Step 1: Load and Review Plan
1. Read plan file
2. Review critically - identify any questions or concerns about the plan
3. If concerns: Raise them with your human partner before starting
4. If no concerns: Create TodoWrite and proceed
### Step 2: Execute Batch
**Default: First 3 tasks**
For each task:
1. Mark as in_progress
2. Follow each step exactly (plan has bite-sized steps)
3. Run verifications as specified
4. Mark as completed
### Step 3: Report
When batch complete:
- Show what was implemented
- Show verification output
- Say: "Ready for feedback."
### Step 4: Continue
Based on feedback:
- Apply changes if needed
- Execute next batch
- Repeat until complete
### Step 5: Complete Development
After all tasks complete and verified:
- Announce: "I'm using the finishing-a-development-branch skill to complete this work."
- **REQUIRED SUB-SKILL:** Use superpowers:finishing-a-development-branch
- Follow that skill to verify tests, present options, execute choice
## When to Stop and Ask for Help
**STOP executing immediately when:**
- Hit a blocker mid-batch (missing dependency, test fails, instruction unclear)
- Plan has critical gaps preventing starting
- You don't understand an instruction
- Verification fails repeatedly
**Ask for clarification rather than guessing.**
## When to Revisit Earlier Steps
**Return to Review (Step 1) when:**
- Partner updates the plan based on your feedback
- Fundamental approach needs rethinking
**Don't force through blockers** - stop and ask.
## Remember
- Review plan critically first
- Follow plan steps exactly
- Don't skip verifications
- Reference skills when plan says to
- Between batches: just report and wait
- Stop when blocked, don't guess
- Never start implementation on main/master branch without explicit user consent
## Integration
**Required workflow skills:**
- **superpowers:using-git-worktrees** - REQUIRED: Set up isolated workspace before starting
- **superpowers:writing-plans** - Creates the plan this skill executes
- **superpowers:finishing-a-development-branch** - Complete development after all tasks

View File

@@ -0,0 +1,177 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS

View File

@@ -0,0 +1,42 @@
---
name: frontend-design
description: Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, artifacts, posters, or applications (examples include websites, landing pages, dashboards, React components, HTML/CSS layouts, or when styling/beautifying any web UI). Generates creative, polished code and UI design that avoids generic AI aesthetics.
license: Complete terms in LICENSE.txt
---
This skill guides creation of distinctive, production-grade frontend interfaces that avoid generic "AI slop" aesthetics. Implement real working code with exceptional attention to aesthetic details and creative choices.
The user provides frontend requirements: a component, page, application, or interface to build. They may include context about the purpose, audience, or technical constraints.
## Design Thinking
Before coding, understand the context and commit to a BOLD aesthetic direction:
- **Purpose**: What problem does this interface solve? Who uses it?
- **Tone**: Pick an extreme: brutally minimal, maximalist chaos, retro-futuristic, organic/natural, luxury/refined, playful/toy-like, editorial/magazine, brutalist/raw, art deco/geometric, soft/pastel, industrial/utilitarian, etc. There are so many flavors to choose from. Use these for inspiration but design one that is true to the aesthetic direction.
- **Constraints**: Technical requirements (framework, performance, accessibility).
- **Differentiation**: What makes this UNFORGETTABLE? What's the one thing someone will remember?
**CRITICAL**: Choose a clear conceptual direction and execute it with precision. Bold maximalism and refined minimalism both work - the key is intentionality, not intensity.
Then implement working code (HTML/CSS/JS, React, Vue, etc.) that is:
- Production-grade and functional
- Visually striking and memorable
- Cohesive with a clear aesthetic point-of-view
- Meticulously refined in every detail
## Frontend Aesthetics Guidelines
Focus on:
- **Typography**: Choose fonts that are beautiful, unique, and interesting. Avoid generic fonts like Arial and Inter; opt instead for distinctive choices that elevate the frontend's aesthetics; unexpected, characterful font choices. Pair a distinctive display font with a refined body font.
- **Color & Theme**: Commit to a cohesive aesthetic. Use CSS variables for consistency. Dominant colors with sharp accents outperform timid, evenly-distributed palettes.
- **Motion**: Use animations for effects and micro-interactions. Prioritize CSS-only solutions for HTML. Use Motion library for React when available. Focus on high-impact moments: one well-orchestrated page load with staggered reveals (animation-delay) creates more delight than scattered micro-interactions. Use scroll-triggering and hover states that surprise.
- **Spatial Composition**: Unexpected layouts. Asymmetry. Overlap. Diagonal flow. Grid-breaking elements. Generous negative space OR controlled density.
- **Backgrounds & Visual Details**: Create atmosphere and depth rather than defaulting to solid colors. Add contextual effects and textures that match the overall aesthetic. Apply creative forms like gradient meshes, noise textures, geometric patterns, layered transparencies, dramatic shadows, decorative borders, custom cursors, and grain overlays.
NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character.
Interpret creatively and make unexpected choices that feel genuinely designed for the context. No design should be the same. Vary between light and dark themes, different fonts, different aesthetics. NEVER converge on common choices (Space Grotesk, for example) across generations.
**IMPORTANT**: Match implementation complexity to the aesthetic vision. Maximalist designs need elaborate code with extensive animations and effects. Minimalist or refined designs need restraint, precision, and careful attention to spacing, typography, and subtle details. Elegance comes from executing the vision well.
Remember: Claude is capable of extraordinary creative work. Don't hold back, show what can truly be created when thinking outside the box and committing fully to a distinctive vision.

View File

@@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -0,0 +1,25 @@
---
name: gh-address-comments
description: Help address review/issue comments on the open GitHub PR for the current branch using gh CLI; verify gh auth first and prompt the user to authenticate if not logged in.
metadata:
short-description: Address comments in a GitHub PR review
---
# PR Comment Handler
Guide to find the open PR for the current branch and address its comments with gh CLI. Run all `gh` commands with elevated network access.
Prereq: ensure `gh` is authenticated (for example, run `gh auth login` once), then run `gh auth status` with escalated permissions (include workflow/repo scopes) so `gh` commands succeed. If sandboxing blocks `gh auth status`, rerun it with `sandbox_permissions=require_escalated`.
## 1) Inspect comments needing attention
- Run scripts/fetch_comments.py which will print out all the comments and review threads on the PR
## 2) Ask the user for clarification
- Number all the review threads and comments and provide a short summary of what would be required to apply a fix for it
- Ask the user which numbered comments should be addressed
## 3) If user chooses comments
- Apply fixes for the selected comments
Notes:
- If gh hits auth/rate issues mid-run, prompt the user to re-authenticate with `gh auth login`, then retry.

View File

@@ -0,0 +1,6 @@
interface:
display_name: "GitHub Address Comments"
short_description: Address comments in a GitHub PR review"
icon_small: "./assets/github-small.svg"
icon_large: "./assets/github.png"
default_prompt: "Address all actionable GitHub PR review comments in this branch and summarize the updates."

View File

@@ -0,0 +1,3 @@
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" viewBox="0 0 16 16">
<path fill="currentColor" d="M8 1.3a6.665 6.665 0 0 1 5.413 10.56 6.677 6.677 0 0 1-3.288 2.432c-.333.067-.458-.142-.458-.316 0-.226.008-.942.008-1.834 0-.625-.208-1.025-.45-1.233 1.483-.167 3.042-.734 3.042-3.292a2.58 2.58 0 0 0-.684-1.792c.067-.166.3-.85-.066-1.766 0 0-.559-.184-1.834.683a6.186 6.186 0 0 0-1.666-.225c-.567 0-1.134.075-1.667.225-1.275-.858-1.833-.683-1.833-.683-.367.916-.134 1.6-.067 1.766a2.594 2.594 0 0 0-.683 1.792c0 2.55 1.55 3.125 3.033 3.292-.192.166-.367.458-.425.891-.383.175-1.342.459-1.942-.55-.125-.2-.5-.691-1.025-.683-.558.008-.225.317.009.442.283.158.608.75.683.941.133.376.567 1.092 2.242.784 0 .558.008 1.083.008 1.242 0 .174-.125.374-.458.316a6.662 6.662 0 0 1-4.559-6.325A6.665 6.665 0 0 1 8 1.3Z"/>
</svg>

After

Width:  |  Height:  |  Size: 853 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 KiB

View File

@@ -0,0 +1,237 @@
#!/usr/bin/env python3
"""
Fetch all PR conversation comments + reviews + review threads (inline threads)
for the PR associated with the current git branch, by shelling out to:
gh api graphql
Requires:
- `gh auth login` already set up
- current branch has an associated (open) PR
Usage:
python fetch_comments.py > pr_comments.json
"""
from __future__ import annotations
import json
import subprocess
import sys
from typing import Any
QUERY = """\
query(
$owner: String!,
$repo: String!,
$number: Int!,
$commentsCursor: String,
$reviewsCursor: String,
$threadsCursor: String
) {
repository(owner: $owner, name: $repo) {
pullRequest(number: $number) {
number
url
title
state
# Top-level "Conversation" comments (issue comments on the PR)
comments(first: 100, after: $commentsCursor) {
pageInfo { hasNextPage endCursor }
nodes {
id
body
createdAt
updatedAt
author { login }
}
}
# Review submissions (Approve / Request changes / Comment), with body if present
reviews(first: 100, after: $reviewsCursor) {
pageInfo { hasNextPage endCursor }
nodes {
id
state
body
submittedAt
author { login }
}
}
# Inline review threads (grouped), includes resolved state
reviewThreads(first: 100, after: $threadsCursor) {
pageInfo { hasNextPage endCursor }
nodes {
id
isResolved
isOutdated
path
line
diffSide
startLine
startDiffSide
originalLine
originalStartLine
resolvedBy { login }
comments(first: 100) {
nodes {
id
body
createdAt
updatedAt
author { login }
}
}
}
}
}
}
}
"""
def _run(cmd: list[str], stdin: str | None = None) -> str:
p = subprocess.run(cmd, input=stdin, capture_output=True, text=True)
if p.returncode != 0:
raise RuntimeError(f"Command failed: {' '.join(cmd)}\n{p.stderr}")
return p.stdout
def _run_json(cmd: list[str], stdin: str | None = None) -> dict[str, Any]:
out = _run(cmd, stdin=stdin)
try:
return json.loads(out)
except json.JSONDecodeError as e:
raise RuntimeError(f"Failed to parse JSON from command output: {e}\nRaw:\n{out}") from e
def _ensure_gh_authenticated() -> None:
try:
_run(["gh", "auth", "status"])
except RuntimeError:
print("run `gh auth login` to authenticate the GitHub CLI", file=sys.stderr)
raise RuntimeError("gh auth status failed; run `gh auth login` to authenticate the GitHub CLI") from None
def gh_pr_view_json(fields: str) -> dict[str, Any]:
# fields is a comma-separated list like: "number,headRepositoryOwner,headRepository"
return _run_json(["gh", "pr", "view", "--json", fields])
def get_current_pr_ref() -> tuple[str, str, int]:
"""
Resolve the PR for the current branch (whatever gh considers associated).
Works for cross-repo PRs too, by reading head repository owner/name.
"""
pr = gh_pr_view_json("number,headRepositoryOwner,headRepository")
owner = pr["headRepositoryOwner"]["login"]
repo = pr["headRepository"]["name"]
number = int(pr["number"])
return owner, repo, number
def gh_api_graphql(
owner: str,
repo: str,
number: int,
comments_cursor: str | None = None,
reviews_cursor: str | None = None,
threads_cursor: str | None = None,
) -> dict[str, Any]:
"""
Call `gh api graphql` using -F variables, avoiding JSON blobs with nulls.
Query is passed via stdin using query=@- to avoid shell newline/quoting issues.
"""
cmd = [
"gh",
"api",
"graphql",
"-F",
"query=@-",
"-F",
f"owner={owner}",
"-F",
f"repo={repo}",
"-F",
f"number={number}",
]
if comments_cursor:
cmd += ["-F", f"commentsCursor={comments_cursor}"]
if reviews_cursor:
cmd += ["-F", f"reviewsCursor={reviews_cursor}"]
if threads_cursor:
cmd += ["-F", f"threadsCursor={threads_cursor}"]
return _run_json(cmd, stdin=QUERY)
def fetch_all(owner: str, repo: str, number: int) -> dict[str, Any]:
conversation_comments: list[dict[str, Any]] = []
reviews: list[dict[str, Any]] = []
review_threads: list[dict[str, Any]] = []
comments_cursor: str | None = None
reviews_cursor: str | None = None
threads_cursor: str | None = None
pr_meta: dict[str, Any] | None = None
while True:
payload = gh_api_graphql(
owner=owner,
repo=repo,
number=number,
comments_cursor=comments_cursor,
reviews_cursor=reviews_cursor,
threads_cursor=threads_cursor,
)
if "errors" in payload and payload["errors"]:
raise RuntimeError(f"GitHub GraphQL errors:\n{json.dumps(payload['errors'], indent=2)}")
pr = payload["data"]["repository"]["pullRequest"]
if pr_meta is None:
pr_meta = {
"number": pr["number"],
"url": pr["url"],
"title": pr["title"],
"state": pr["state"],
"owner": owner,
"repo": repo,
}
c = pr["comments"]
r = pr["reviews"]
t = pr["reviewThreads"]
conversation_comments.extend(c.get("nodes") or [])
reviews.extend(r.get("nodes") or [])
review_threads.extend(t.get("nodes") or [])
comments_cursor = c["pageInfo"]["endCursor"] if c["pageInfo"]["hasNextPage"] else None
reviews_cursor = r["pageInfo"]["endCursor"] if r["pageInfo"]["hasNextPage"] else None
threads_cursor = t["pageInfo"]["endCursor"] if t["pageInfo"]["hasNextPage"] else None
if not (comments_cursor or reviews_cursor or threads_cursor):
break
assert pr_meta is not None
return {
"pull_request": pr_meta,
"conversation_comments": conversation_comments,
"reviews": reviews,
"review_threads": review_threads,
}
def main() -> None:
_ensure_gh_authenticated()
owner, repo, number = get_current_pr_ref()
result = fetch_all(owner, repo, number)
print(json.dumps(result, indent=2))
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf of
any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don\'t include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -0,0 +1,69 @@
---
name: "gh-fix-ci"
description: "Use when a user asks to debug or fix failing GitHub PR checks that run in GitHub Actions; use `gh` to inspect checks and logs, summarize failure context, draft a fix plan, and implement only after explicit approval. Treat external providers (for example Buildkite) as out of scope and report only the details URL."
---
# Gh Pr Checks Plan Fix
## Overview
Use gh to locate failing PR checks, fetch GitHub Actions logs for actionable failures, summarize the failure snippet, then propose a fix plan and implement after explicit approval.
- If a plan-oriented skill (for example `create-plan`) is available, use it; otherwise draft a concise plan inline and request approval before implementing.
Prereq: authenticate with the standard GitHub CLI once (for example, run `gh auth login`), then confirm with `gh auth status` (repo + workflow scopes are typically required).
## Inputs
- `repo`: path inside the repo (default `.`)
- `pr`: PR number or URL (optional; defaults to current branch PR)
- `gh` authentication for the repo host
## Quick start
- `python "<path-to-skill>/scripts/inspect_pr_checks.py" --repo "." --pr "<number-or-url>"`
- Add `--json` if you want machine-friendly output for summarization.
## Workflow
1. Verify gh authentication.
- Run `gh auth status` in the repo.
- If unauthenticated, ask the user to run `gh auth login` (ensuring repo + workflow scopes) before proceeding.
2. Resolve the PR.
- Prefer the current branch PR: `gh pr view --json number,url`.
- If the user provides a PR number or URL, use that directly.
3. Inspect failing checks (GitHub Actions only).
- Preferred: run the bundled script (handles gh field drift and job-log fallbacks):
- `python "<path-to-skill>/scripts/inspect_pr_checks.py" --repo "." --pr "<number-or-url>"`
- Add `--json` for machine-friendly output.
- Manual fallback:
- `gh pr checks <pr> --json name,state,bucket,link,startedAt,completedAt,workflow`
- If a field is rejected, rerun with the available fields reported by `gh`.
- For each failing check, extract the run id from `detailsUrl` and run:
- `gh run view <run_id> --json name,workflowName,conclusion,status,url,event,headBranch,headSha`
- `gh run view <run_id> --log`
- If the run log says it is still in progress, fetch job logs directly:
- `gh api "/repos/<owner>/<repo>/actions/jobs/<job_id>/logs" > "<path>"`
4. Scope non-GitHub Actions checks.
- If `detailsUrl` is not a GitHub Actions run, label it as external and only report the URL.
- Do not attempt Buildkite or other providers; keep the workflow lean.
5. Summarize failures for the user.
- Provide the failing check name, run URL (if any), and a concise log snippet.
- Call out missing logs explicitly.
6. Create a plan.
- Use the `create-plan` skill to draft a concise plan and request approval.
7. Implement after approval.
- Apply the approved plan, summarize diffs/tests, and ask about opening a PR.
8. Recheck status.
- After changes, suggest re-running the relevant tests and `gh pr checks` to confirm.
## Bundled Resources
### scripts/inspect_pr_checks.py
Fetch failing PR checks, pull GitHub Actions logs, and extract a failure snippet. Exits non-zero when failures remain so it can be used in automation.
Usage examples:
- `python "<path-to-skill>/scripts/inspect_pr_checks.py" --repo "." --pr "123"`
- `python "<path-to-skill>/scripts/inspect_pr_checks.py" --repo "." --pr "https://github.com/org/repo/pull/123" --json`
- `python "<path-to-skill>/scripts/inspect_pr_checks.py" --repo "." --max-lines 200 --context 40`

View File

@@ -0,0 +1,6 @@
interface:
display_name: "GitHub Fix CI"
short_description: "Debug failing GitHub Actions CI"
icon_small: "./assets/github-small.svg"
icon_large: "./assets/github.png"
default_prompt: "Inspect failing GitHub Actions checks in this repo, summarize root cause, and propose a focused fix plan."

View File

@@ -0,0 +1,3 @@
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" viewBox="0 0 16 16">
<path fill="currentColor" d="M8 1.3a6.665 6.665 0 0 1 5.413 10.56 6.677 6.677 0 0 1-3.288 2.432c-.333.067-.458-.142-.458-.316 0-.226.008-.942.008-1.834 0-.625-.208-1.025-.45-1.233 1.483-.167 3.042-.734 3.042-3.292a2.58 2.58 0 0 0-.684-1.792c.067-.166.3-.85-.066-1.766 0 0-.559-.184-1.834.683a6.186 6.186 0 0 0-1.666-.225c-.567 0-1.134.075-1.667.225-1.275-.858-1.833-.683-1.833-.683-.367.916-.134 1.6-.067 1.766a2.594 2.594 0 0 0-.683 1.792c0 2.55 1.55 3.125 3.033 3.292-.192.166-.367.458-.425.891-.383.175-1.342.459-1.942-.55-.125-.2-.5-.691-1.025-.683-.558.008-.225.317.009.442.283.158.608.75.683.941.133.376.567 1.092 2.242.784 0 .558.008 1.083.008 1.242 0 .174-.125.374-.458.316a6.662 6.662 0 0 1-4.559-6.325A6.665 6.665 0 0 1 8 1.3Z"/>
</svg>

After

Width:  |  Height:  |  Size: 853 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 KiB

View File

@@ -0,0 +1,509 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import json
import re
import subprocess
import sys
from pathlib import Path
from shutil import which
from typing import Any, Iterable, Sequence
FAILURE_CONCLUSIONS = {
"failure",
"cancelled",
"timed_out",
"action_required",
}
FAILURE_STATES = {
"failure",
"error",
"cancelled",
"timed_out",
"action_required",
}
FAILURE_BUCKETS = {"fail"}
FAILURE_MARKERS = (
"error",
"fail",
"failed",
"traceback",
"exception",
"assert",
"panic",
"fatal",
"timeout",
"segmentation fault",
)
DEFAULT_MAX_LINES = 160
DEFAULT_CONTEXT_LINES = 30
PENDING_LOG_MARKERS = (
"still in progress",
"log will be available when it is complete",
)
class GhResult:
def __init__(self, returncode: int, stdout: str, stderr: str):
self.returncode = returncode
self.stdout = stdout
self.stderr = stderr
def run_gh_command(args: Sequence[str], cwd: Path) -> GhResult:
process = subprocess.run(
["gh", *args],
cwd=cwd,
text=True,
capture_output=True,
)
return GhResult(process.returncode, process.stdout, process.stderr)
def run_gh_command_raw(args: Sequence[str], cwd: Path) -> tuple[int, bytes, str]:
process = subprocess.run(
["gh", *args],
cwd=cwd,
capture_output=True,
)
stderr = process.stderr.decode(errors="replace")
return process.returncode, process.stdout, stderr
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
description=(
"Inspect failing GitHub PR checks, fetch GitHub Actions logs, and extract a "
"failure snippet."
),
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
)
parser.add_argument("--repo", default=".", help="Path inside the target Git repository.")
parser.add_argument(
"--pr", default=None, help="PR number or URL (defaults to current branch PR)."
)
parser.add_argument("--max-lines", type=int, default=DEFAULT_MAX_LINES)
parser.add_argument("--context", type=int, default=DEFAULT_CONTEXT_LINES)
parser.add_argument("--json", action="store_true", help="Emit JSON instead of text output.")
return parser.parse_args()
def main() -> int:
args = parse_args()
repo_root = find_git_root(Path(args.repo))
if repo_root is None:
print("Error: not inside a Git repository.", file=sys.stderr)
return 1
if not ensure_gh_available(repo_root):
return 1
pr_value = resolve_pr(args.pr, repo_root)
if pr_value is None:
return 1
checks = fetch_checks(pr_value, repo_root)
if checks is None:
return 1
failing = [c for c in checks if is_failing(c)]
if not failing:
print(f"PR #{pr_value}: no failing checks detected.")
return 0
results = []
for check in failing:
results.append(
analyze_check(
check,
repo_root=repo_root,
max_lines=max(1, args.max_lines),
context=max(1, args.context),
)
)
if args.json:
print(json.dumps({"pr": pr_value, "results": results}, indent=2))
else:
render_results(pr_value, results)
return 1
def find_git_root(start: Path) -> Path | None:
result = subprocess.run(
["git", "rev-parse", "--show-toplevel"],
cwd=start,
text=True,
capture_output=True,
)
if result.returncode != 0:
return None
return Path(result.stdout.strip())
def ensure_gh_available(repo_root: Path) -> bool:
if which("gh") is None:
print("Error: gh is not installed or not on PATH.", file=sys.stderr)
return False
result = run_gh_command(["auth", "status"], cwd=repo_root)
if result.returncode == 0:
return True
message = (result.stderr or result.stdout or "").strip()
print(message or "Error: gh not authenticated.", file=sys.stderr)
return False
def resolve_pr(pr_value: str | None, repo_root: Path) -> str | None:
if pr_value:
return pr_value
result = run_gh_command(["pr", "view", "--json", "number"], cwd=repo_root)
if result.returncode != 0:
message = (result.stderr or result.stdout or "").strip()
print(message or "Error: unable to resolve PR.", file=sys.stderr)
return None
try:
data = json.loads(result.stdout or "{}")
except json.JSONDecodeError:
print("Error: unable to parse PR JSON.", file=sys.stderr)
return None
number = data.get("number")
if not number:
print("Error: no PR number found.", file=sys.stderr)
return None
return str(number)
def fetch_checks(pr_value: str, repo_root: Path) -> list[dict[str, Any]] | None:
primary_fields = ["name", "state", "conclusion", "detailsUrl", "startedAt", "completedAt"]
result = run_gh_command(
["pr", "checks", pr_value, "--json", ",".join(primary_fields)],
cwd=repo_root,
)
if result.returncode != 0:
message = "\n".join(filter(None, [result.stderr, result.stdout])).strip()
available_fields = parse_available_fields(message)
if available_fields:
fallback_fields = [
"name",
"state",
"bucket",
"link",
"startedAt",
"completedAt",
"workflow",
]
selected_fields = [field for field in fallback_fields if field in available_fields]
if not selected_fields:
print("Error: no usable fields available for gh pr checks.", file=sys.stderr)
return None
result = run_gh_command(
["pr", "checks", pr_value, "--json", ",".join(selected_fields)],
cwd=repo_root,
)
if result.returncode != 0:
message = (result.stderr or result.stdout or "").strip()
print(message or "Error: gh pr checks failed.", file=sys.stderr)
return None
else:
print(message or "Error: gh pr checks failed.", file=sys.stderr)
return None
try:
data = json.loads(result.stdout or "[]")
except json.JSONDecodeError:
print("Error: unable to parse checks JSON.", file=sys.stderr)
return None
if not isinstance(data, list):
print("Error: unexpected checks JSON shape.", file=sys.stderr)
return None
return data
def is_failing(check: dict[str, Any]) -> bool:
conclusion = normalize_field(check.get("conclusion"))
if conclusion in FAILURE_CONCLUSIONS:
return True
state = normalize_field(check.get("state") or check.get("status"))
if state in FAILURE_STATES:
return True
bucket = normalize_field(check.get("bucket"))
return bucket in FAILURE_BUCKETS
def analyze_check(
check: dict[str, Any],
repo_root: Path,
max_lines: int,
context: int,
) -> dict[str, Any]:
url = check.get("detailsUrl") or check.get("link") or ""
run_id = extract_run_id(url)
job_id = extract_job_id(url)
base: dict[str, Any] = {
"name": check.get("name", ""),
"detailsUrl": url,
"runId": run_id,
"jobId": job_id,
}
if run_id is None:
base["status"] = "external"
base["note"] = "No GitHub Actions run id detected in detailsUrl."
return base
metadata = fetch_run_metadata(run_id, repo_root)
log_text, log_error, log_status = fetch_check_log(
run_id=run_id,
job_id=job_id,
repo_root=repo_root,
)
if log_status == "pending":
base["status"] = "log_pending"
base["note"] = log_error or "Logs are not available yet."
if metadata:
base["run"] = metadata
return base
if log_error:
base["status"] = "log_unavailable"
base["error"] = log_error
if metadata:
base["run"] = metadata
return base
snippet = extract_failure_snippet(log_text, max_lines=max_lines, context=context)
base["status"] = "ok"
base["run"] = metadata or {}
base["logSnippet"] = snippet
base["logTail"] = tail_lines(log_text, max_lines)
return base
def extract_run_id(url: str) -> str | None:
if not url:
return None
for pattern in (r"/actions/runs/(\d+)", r"/runs/(\d+)"):
match = re.search(pattern, url)
if match:
return match.group(1)
return None
def extract_job_id(url: str) -> str | None:
if not url:
return None
match = re.search(r"/actions/runs/\d+/job/(\d+)", url)
if match:
return match.group(1)
match = re.search(r"/job/(\d+)", url)
if match:
return match.group(1)
return None
def fetch_run_metadata(run_id: str, repo_root: Path) -> dict[str, Any] | None:
fields = [
"conclusion",
"status",
"workflowName",
"name",
"event",
"headBranch",
"headSha",
"url",
]
result = run_gh_command(["run", "view", run_id, "--json", ",".join(fields)], cwd=repo_root)
if result.returncode != 0:
return None
try:
data = json.loads(result.stdout or "{}")
except json.JSONDecodeError:
return None
if not isinstance(data, dict):
return None
return data
def fetch_check_log(
run_id: str,
job_id: str | None,
repo_root: Path,
) -> tuple[str, str, str]:
log_text, log_error = fetch_run_log(run_id, repo_root)
if not log_error:
return log_text, "", "ok"
if is_log_pending_message(log_error) and job_id:
job_log, job_error = fetch_job_log(job_id, repo_root)
if job_log:
return job_log, "", "ok"
if job_error and is_log_pending_message(job_error):
return "", job_error, "pending"
if job_error:
return "", job_error, "error"
return "", log_error, "pending"
if is_log_pending_message(log_error):
return "", log_error, "pending"
return "", log_error, "error"
def fetch_run_log(run_id: str, repo_root: Path) -> tuple[str, str]:
result = run_gh_command(["run", "view", run_id, "--log"], cwd=repo_root)
if result.returncode != 0:
error = (result.stderr or result.stdout or "").strip()
return "", error or "gh run view failed"
return result.stdout, ""
def fetch_job_log(job_id: str, repo_root: Path) -> tuple[str, str]:
repo_slug = fetch_repo_slug(repo_root)
if not repo_slug:
return "", "Error: unable to resolve repository name for job logs."
endpoint = f"/repos/{repo_slug}/actions/jobs/{job_id}/logs"
returncode, stdout_bytes, stderr = run_gh_command_raw(["api", endpoint], cwd=repo_root)
if returncode != 0:
message = (stderr or stdout_bytes.decode(errors="replace")).strip()
return "", message or "gh api job logs failed"
if is_zip_payload(stdout_bytes):
return "", "Job logs returned a zip archive; unable to parse."
return stdout_bytes.decode(errors="replace"), ""
def fetch_repo_slug(repo_root: Path) -> str | None:
result = run_gh_command(["repo", "view", "--json", "nameWithOwner"], cwd=repo_root)
if result.returncode != 0:
return None
try:
data = json.loads(result.stdout or "{}")
except json.JSONDecodeError:
return None
name_with_owner = data.get("nameWithOwner")
if not name_with_owner:
return None
return str(name_with_owner)
def normalize_field(value: Any) -> str:
if value is None:
return ""
return str(value).strip().lower()
def parse_available_fields(message: str) -> list[str]:
if "Available fields:" not in message:
return []
fields: list[str] = []
collecting = False
for line in message.splitlines():
if "Available fields:" in line:
collecting = True
continue
if not collecting:
continue
field = line.strip()
if not field:
continue
fields.append(field)
return fields
def is_log_pending_message(message: str) -> bool:
lowered = message.lower()
return any(marker in lowered for marker in PENDING_LOG_MARKERS)
def is_zip_payload(payload: bytes) -> bool:
return payload.startswith(b"PK")
def extract_failure_snippet(log_text: str, max_lines: int, context: int) -> str:
lines = log_text.splitlines()
if not lines:
return ""
marker_index = find_failure_index(lines)
if marker_index is None:
return "\n".join(lines[-max_lines:])
start = max(0, marker_index - context)
end = min(len(lines), marker_index + context)
window = lines[start:end]
if len(window) > max_lines:
window = window[-max_lines:]
return "\n".join(window)
def find_failure_index(lines: Sequence[str]) -> int | None:
for idx in range(len(lines) - 1, -1, -1):
lowered = lines[idx].lower()
if any(marker in lowered for marker in FAILURE_MARKERS):
return idx
return None
def tail_lines(text: str, max_lines: int) -> str:
if max_lines <= 0:
return ""
lines = text.splitlines()
return "\n".join(lines[-max_lines:])
def render_results(pr_number: str, results: Iterable[dict[str, Any]]) -> None:
results_list = list(results)
print(f"PR #{pr_number}: {len(results_list)} failing checks analyzed.")
for result in results_list:
print("-" * 60)
print(f"Check: {result.get('name', '')}")
if result.get("detailsUrl"):
print(f"Details: {result['detailsUrl']}")
run_id = result.get("runId")
if run_id:
print(f"Run ID: {run_id}")
job_id = result.get("jobId")
if job_id:
print(f"Job ID: {job_id}")
status = result.get("status", "unknown")
print(f"Status: {status}")
run_meta = result.get("run", {})
if run_meta:
branch = run_meta.get("headBranch", "")
sha = (run_meta.get("headSha") or "")[:12]
workflow = run_meta.get("workflowName") or run_meta.get("name") or ""
conclusion = run_meta.get("conclusion") or run_meta.get("status") or ""
print(f"Workflow: {workflow} ({conclusion})")
if branch or sha:
print(f"Branch/SHA: {branch} {sha}")
if run_meta.get("url"):
print(f"Run URL: {run_meta['url']}")
if result.get("note"):
print(f"Note: {result['note']}")
if result.get("error"):
print(f"Error fetching logs: {result['error']}")
continue
snippet = result.get("logSnippet") or ""
if snippet:
print("Failure snippet:")
print(indent_block(snippet, prefix=" "))
else:
print("No snippet available.")
print("-" * 60)
def indent_block(text: str, prefix: str = " ") -> str:
return "\n".join(f"{prefix}{line}" for line in text.splitlines())
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,227 @@
---
name: ios-simulator-skill
version: 1.3.0
description: 21 production-ready scripts for iOS app testing, building, and automation. Provides semantic UI navigation, build automation, accessibility testing, and simulator lifecycle management. Optimized for AI agents with minimal token output.
---
# iOS Simulator Skill
Build, test, and automate iOS applications using accessibility-driven navigation and structured data instead of pixel coordinates.
## Quick Start
```bash
# 1. Check environment
bash scripts/sim_health_check.sh
# 2. Launch app
python scripts/app_launcher.py --launch com.example.app
# 3. Map screen to see elements
python scripts/screen_mapper.py
# 4. Tap button
python scripts/navigator.py --find-text "Login" --tap
# 5. Enter text
python scripts/navigator.py --find-type TextField --enter-text "user@example.com"
```
All scripts support `--help` for detailed options and `--json` for machine-readable output.
## 21 Production Scripts
### Build & Development (2 scripts)
1. **build_and_test.py** - Build Xcode projects, run tests, parse results with progressive disclosure
- Build with live result streaming
- Parse errors and warnings from xcresult bundles
- Retrieve detailed build logs on demand
- Options: `--project`, `--scheme`, `--clean`, `--test`, `--verbose`, `--json`
2. **log_monitor.py** - Real-time log monitoring with intelligent filtering
- Stream logs or capture by duration
- Filter by severity (error/warning/info/debug)
- Deduplicate repeated messages
- Options: `--app`, `--severity`, `--follow`, `--duration`, `--output`, `--json`
### Navigation & Interaction (5 scripts)
3. **screen_mapper.py** - Analyze current screen and list interactive elements
- Element type breakdown
- Interactive button list
- Text field status
- Options: `--verbose`, `--hints`, `--json`
4. **navigator.py** - Find and interact with elements semantically
- Find by text (fuzzy matching)
- Find by element type
- Find by accessibility ID
- Enter text or tap elements
- Options: `--find-text`, `--find-type`, `--find-id`, `--tap`, `--enter-text`, `--json`
5. **gesture.py** - Perform swipes, scrolls, pinches, and complex gestures
- Directional swipes (up/down/left/right)
- Multi-swipe scrolling
- Pinch zoom
- Long press
- Pull to refresh
- Options: `--swipe`, `--scroll`, `--pinch`, `--long-press`, `--refresh`, `--json`
6. **keyboard.py** - Text input and hardware button control
- Type text (fast or slow)
- Special keys (return, delete, tab, space, arrows)
- Hardware buttons (home, lock, volume, screenshot)
- Key combinations
- Options: `--type`, `--key`, `--button`, `--slow`, `--clear`, `--dismiss`, `--json`
7. **app_launcher.py** - App lifecycle management
- Launch apps by bundle ID
- Terminate apps
- Install/uninstall from .app bundles
- Deep link navigation
- List installed apps
- Check app state
- Options: `--launch`, `--terminate`, `--install`, `--uninstall`, `--open-url`, `--list`, `--state`, `--json`
### Testing & Analysis (5 scripts)
8. **accessibility_audit.py** - Check WCAG compliance on current screen
- Critical issues (missing labels, empty buttons, no alt text)
- Warnings (missing hints, small touch targets)
- Info (missing IDs, deep nesting)
- Options: `--verbose`, `--output`, `--json`
9. **visual_diff.py** - Compare two screenshots for visual changes
- Pixel-by-pixel comparison
- Threshold-based pass/fail
- Generate diff images
- Options: `--threshold`, `--output`, `--details`, `--json`
10. **test_recorder.py** - Automatically document test execution
- Capture screenshots and accessibility trees per step
- Generate markdown reports with timing data
- Options: `--test-name`, `--output`, `--verbose`, `--json`
11. **app_state_capture.py** - Create comprehensive debugging snapshots
- Screenshot, UI hierarchy, app logs, device info
- Markdown summary for bug reports
- Options: `--app-bundle-id`, `--output`, `--log-lines`, `--json`
12. **sim_health_check.sh** - Verify environment is properly configured
- Check macOS, Xcode, simctl, IDB, Python
- List available and booted simulators
- Verify Python packages (Pillow)
### Advanced Testing & Permissions (4 scripts)
13. **clipboard.py** - Manage simulator clipboard for paste testing
- Copy text to clipboard
- Test paste flows without manual entry
- Options: `--copy`, `--test-name`, `--expected`, `--json`
14. **status_bar.py** - Override simulator status bar appearance
- Presets: clean (9:41, 100% battery), testing (11:11, 50%), low-battery (20%), airplane (offline)
- Custom time, network, battery, WiFi settings
- Options: `--preset`, `--time`, `--data-network`, `--battery-level`, `--clear`, `--json`
15. **push_notification.py** - Send simulated push notifications
- Simple mode (title + body + badge)
- Custom JSON payloads
- Test notification handling and deep links
- Options: `--bundle-id`, `--title`, `--body`, `--badge`, `--payload`, `--json`
16. **privacy_manager.py** - Grant, revoke, and reset app permissions
- 13 supported services (camera, microphone, location, contacts, photos, calendar, health, etc.)
- Batch operations (comma-separated services)
- Audit trail with test scenario tracking
- Options: `--bundle-id`, `--grant`, `--revoke`, `--reset`, `--list`, `--json`
### Device Lifecycle Management (5 scripts)
17. **simctl_boot.py** - Boot simulators with optional readiness verification
- Boot by UDID or device name
- Wait for device ready with timeout
- Batch boot operations (--all, --type)
- Performance timing
- Options: `--udid`, `--name`, `--wait-ready`, `--timeout`, `--all`, `--type`, `--json`
18. **simctl_shutdown.py** - Gracefully shutdown simulators
- Shutdown by UDID or device name
- Optional verification of shutdown completion
- Batch shutdown operations
- Options: `--udid`, `--name`, `--verify`, `--timeout`, `--all`, `--type`, `--json`
19. **simctl_create.py** - Create simulators dynamically
- Create by device type and iOS version
- List available device types and runtimes
- Custom device naming
- Returns UDID for CI/CD integration
- Options: `--device`, `--runtime`, `--name`, `--list-devices`, `--list-runtimes`, `--json`
20. **simctl_delete.py** - Permanently delete simulators
- Delete by UDID or device name
- Safety confirmation by default (skip with --yes)
- Batch delete operations
- Smart deletion (--old N to keep N per device type)
- Options: `--udid`, `--name`, `--yes`, `--all`, `--type`, `--old`, `--json`
21. **simctl_erase.py** - Factory reset simulators without deletion
- Preserve device UUID (faster than delete+create)
- Erase all, by type, or booted simulators
- Optional verification
- Options: `--udid`, `--name`, `--verify`, `--timeout`, `--all`, `--type`, `--booted`, `--json`
## Common Patterns
**Auto-UDID Detection**: Most scripts auto-detect the booted simulator if --udid is not provided.
**Device Name Resolution**: Use device names (e.g., "iPhone 16 Pro") instead of UDIDs - scripts resolve automatically.
**Batch Operations**: Many scripts support `--all` for all simulators or `--type iPhone` for device type filtering.
**Output Formats**: Default is concise human-readable output. Use `--json` for machine-readable output in CI/CD.
**Help**: All scripts support `--help` for detailed options and examples.
## Typical Workflow
1. Verify environment: `bash scripts/sim_health_check.sh`
2. Launch app: `python scripts/app_launcher.py --launch com.example.app`
3. Analyze screen: `python scripts/screen_mapper.py`
4. Interact: `python scripts/navigator.py --find-text "Button" --tap`
5. Verify: `python scripts/accessibility_audit.py`
6. Debug if needed: `python scripts/app_state_capture.py --app-bundle-id com.example.app`
## Requirements
- macOS 12+
- Xcode Command Line Tools
- Python 3
- IDB (optional, for interactive features)
## Documentation
- **SKILL.md** (this file) - Script reference and quick start
- **README.md** - Installation and examples
- **CLAUDE.md** - Architecture and implementation details
- **references/** - Deep documentation on specific topics
- **examples/** - Complete automation workflows
## Key Design Principles
**Semantic Navigation**: Find elements by meaning (text, type, ID) not pixel coordinates. Survives UI changes.
**Token Efficiency**: Concise default output (3-5 lines) with optional verbose and JSON modes for detailed results.
**Accessibility-First**: Built on standard accessibility APIs for reliability and compatibility.
**Zero Configuration**: Works immediately on any macOS with Xcode. No setup required.
**Structured Data**: Scripts output JSON or formatted text, not raw logs. Easy to parse and integrate.
**Auto-Learning**: Build system remembers your device preference. Configuration stored per-project.
---
Use these scripts directly or let Claude Code invoke them automatically when your request matches the skill description.

View File

@@ -0,0 +1,292 @@
#!/usr/bin/env python3
"""
iOS Simulator Accessibility Audit
Scans the current simulator screen for accessibility compliance issues.
Optimized for minimal token output while maintaining functionality.
Usage: python scripts/accessibility_audit.py [options]
"""
import argparse
import json
import subprocess
import sys
from dataclasses import asdict, dataclass
from typing import Any
from common import flatten_tree, get_accessibility_tree, resolve_udid
@dataclass
class Issue:
"""Represents an accessibility issue."""
severity: str # critical, warning, info
rule: str
element_type: str
issue: str
fix: str
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization."""
return asdict(self)
class AccessibilityAuditor:
"""Performs accessibility audits on iOS simulator screens."""
# Critical rules that block users
CRITICAL_RULES = {
"missing_label": lambda e: e.get("type") in ["Button", "Link"] and not e.get("AXLabel"),
"empty_button": lambda e: e.get("type") == "Button"
and not (e.get("AXLabel") or e.get("AXValue")),
"image_no_alt": lambda e: e.get("type") == "Image" and not e.get("AXLabel"),
}
# Warnings that degrade UX
WARNING_RULES = {
"missing_hint": lambda e: e.get("type") in ["Slider", "TextField"] and not e.get("help"),
"missing_traits": lambda e: e.get("type") and not e.get("traits"),
}
# Info level suggestions
INFO_RULES = {
"no_identifier": lambda e: not e.get("AXUniqueId"),
"deep_nesting": lambda e: e.get("depth", 0) > 5,
}
def __init__(self, udid: str | None = None):
"""Initialize auditor with optional device UDID."""
self.udid = udid
def get_accessibility_tree(self) -> dict:
"""Fetch accessibility tree from simulator using shared utility."""
return get_accessibility_tree(self.udid, nested=True)
@staticmethod
def _is_small_target(element: dict) -> bool:
"""Check if touch target is too small (< 44x44 points)."""
frame = element.get("frame", {})
width = frame.get("width", 0)
height = frame.get("height", 0)
return width < 44 or height < 44
def _flatten_tree(self, node: dict, depth: int = 0) -> list[dict]:
"""Flatten nested accessibility tree for easier processing using shared utility."""
return flatten_tree(node, depth)
def audit_element(self, element: dict) -> list[Issue]:
"""Audit a single element for accessibility issues."""
issues = []
# Check critical rules
for rule_name, rule_func in self.CRITICAL_RULES.items():
if rule_func(element):
issues.append(
Issue(
severity="critical",
rule=rule_name,
element_type=element.get("type", "Unknown"),
issue=self._get_issue_description(rule_name),
fix=self._get_fix_suggestion(rule_name),
)
)
# Check warnings (skip if critical issues found)
if not issues:
for rule_name, rule_func in self.WARNING_RULES.items():
if rule_func(element):
issues.append(
Issue(
severity="warning",
rule=rule_name,
element_type=element.get("type", "Unknown"),
issue=self._get_issue_description(rule_name),
fix=self._get_fix_suggestion(rule_name),
)
)
# Check info level (only if verbose or no other issues)
if not issues:
for rule_name, rule_func in self.INFO_RULES.items():
if rule_func(element):
issues.append(
Issue(
severity="info",
rule=rule_name,
element_type=element.get("type", "Unknown"),
issue=self._get_issue_description(rule_name),
fix=self._get_fix_suggestion(rule_name),
)
)
return issues
def _get_issue_description(self, rule: str) -> str:
"""Get human-readable issue description."""
descriptions = {
"missing_label": "Interactive element missing accessibility label",
"empty_button": "Button has no text or label",
"image_no_alt": "Image missing alternative text",
"missing_hint": "Complex control missing hint",
"small_touch_target": "Touch target smaller than 44x44pt",
"missing_traits": "Element missing accessibility traits",
"no_identifier": "Missing accessibility identifier",
"deep_nesting": "Deeply nested (>5 levels)",
}
return descriptions.get(rule, "Accessibility issue")
def _get_fix_suggestion(self, rule: str) -> str:
"""Get fix suggestion for issue."""
fixes = {
"missing_label": "Add accessibilityLabel",
"empty_button": "Set button title or accessibilityLabel",
"image_no_alt": "Add accessibilityLabel with description",
"missing_hint": "Add accessibilityHint",
"small_touch_target": "Increase to minimum 44x44pt",
"missing_traits": "Set appropriate accessibilityTraits",
"no_identifier": "Add accessibilityIdentifier for testing",
"deep_nesting": "Simplify view hierarchy",
}
return fixes.get(rule, "Review accessibility")
def audit(self, verbose: bool = False) -> dict[str, Any]:
"""Perform full accessibility audit."""
# Get accessibility tree
tree = self.get_accessibility_tree()
# Flatten for processing
elements = self._flatten_tree(tree)
# Audit each element
all_issues = []
for element in elements:
issues = self.audit_element(element)
for issue in issues:
issue_dict = issue.to_dict()
# Add minimal element info for context
issue_dict["element"] = {
"type": element.get("type", "Unknown"),
"label": element.get("AXLabel", "")[:30] if element.get("AXLabel") else None,
}
all_issues.append(issue_dict)
# Count by severity
critical = len([i for i in all_issues if i["severity"] == "critical"])
warning = len([i for i in all_issues if i["severity"] == "warning"])
info = len([i for i in all_issues if i["severity"] == "info"])
# Build result (token-optimized)
result = {
"summary": {
"total": len(elements),
"issues": len(all_issues),
"critical": critical,
"warning": warning,
"info": info,
}
}
if verbose:
# Full details only if requested
result["issues"] = all_issues
else:
# Default: top issues only (token-efficient)
result["top_issues"] = self._get_top_issues(all_issues)
return result
def _get_top_issues(self, issues: list[dict]) -> list[dict]:
"""Get top 3 issues grouped by type (token-efficient)."""
if not issues:
return []
# Group by rule
grouped = {}
for issue in issues:
rule = issue["rule"]
if rule not in grouped:
grouped[rule] = {
"severity": issue["severity"],
"rule": rule,
"count": 0,
"fix": issue["fix"],
}
grouped[rule]["count"] += 1
# Sort by severity and count
severity_order = {"critical": 0, "warning": 1, "info": 2}
sorted_issues = sorted(
grouped.values(), key=lambda x: (severity_order[x["severity"]], -x["count"])
)
return sorted_issues[:3]
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(
description="Audit iOS simulator screen for accessibility issues"
)
parser.add_argument(
"--udid",
help="Device UDID (auto-detects booted simulator if not provided)",
)
parser.add_argument("--output", help="Save JSON report to file")
parser.add_argument(
"--verbose", action="store_true", help="Include all issue details (increases output)"
)
args = parser.parse_args()
# Resolve UDID with auto-detection
try:
udid = resolve_udid(args.udid)
except RuntimeError as e:
print(f"Error: {e}")
sys.exit(1)
# Perform audit
auditor = AccessibilityAuditor(udid=udid)
try:
result = auditor.audit(verbose=args.verbose)
except Exception as e:
print(f"Error: {e}")
sys.exit(1)
# Output results
if args.output:
# Save to file
with open(args.output, "w") as f:
json.dump(result, f, indent=2)
# Print minimal summary
summary = result["summary"]
print(f"Audit complete: {summary['issues']} issues ({summary['critical']} critical)")
print(f"Report saved to: {args.output}")
# Print to stdout (token-optimized by default)
elif args.verbose:
print(json.dumps(result, indent=2))
else:
# Ultra-compact output
summary = result["summary"]
print(f"Elements: {summary['total']}, Issues: {summary['issues']}")
print(
f"Critical: {summary['critical']}, Warning: {summary['warning']}, Info: {summary['info']}"
)
if result.get("top_issues"):
print("\nTop issues:")
for issue in result["top_issues"]:
print(
f" [{issue['severity']}] {issue['rule']} ({issue['count']}x) - {issue['fix']}"
)
# Exit with error if critical issues found
if result["summary"]["critical"] > 0:
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,322 @@
#!/usr/bin/env python3
"""
iOS App Launcher - App Lifecycle Control
Launches, terminates, and manages iOS apps in the simulator.
Handles deep links and app switching.
Usage: python scripts/app_launcher.py --launch com.example.app
"""
import argparse
import contextlib
import subprocess
import sys
import time
from common import build_simctl_command, resolve_udid
class AppLauncher:
"""Controls app lifecycle on iOS simulator."""
def __init__(self, udid: str | None = None):
"""Initialize app launcher."""
self.udid = udid
def launch(self, bundle_id: str, wait_for_debugger: bool = False) -> tuple[bool, int | None]:
"""
Launch an app.
Args:
bundle_id: App bundle identifier
wait_for_debugger: Wait for debugger attachment
Returns:
(success, pid) tuple
"""
cmd = build_simctl_command("launch", self.udid, bundle_id)
if wait_for_debugger:
cmd.insert(3, "--wait-for-debugger") # Insert after "launch" operation
try:
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
# Parse PID from output if available
pid = None
if result.stdout:
# Output format: "com.example.app: <PID>"
parts = result.stdout.strip().split(":")
if len(parts) > 1:
with contextlib.suppress(ValueError):
pid = int(parts[1].strip())
return (True, pid)
except subprocess.CalledProcessError:
return (False, None)
def terminate(self, bundle_id: str) -> bool:
"""
Terminate an app.
Args:
bundle_id: App bundle identifier
Returns:
Success status
"""
cmd = build_simctl_command("terminate", self.udid, bundle_id)
try:
subprocess.run(cmd, capture_output=True, check=True)
return True
except subprocess.CalledProcessError:
return False
def install(self, app_path: str) -> bool:
"""
Install an app.
Args:
app_path: Path to .app bundle
Returns:
Success status
"""
cmd = build_simctl_command("install", self.udid, app_path)
try:
subprocess.run(cmd, capture_output=True, check=True)
return True
except subprocess.CalledProcessError:
return False
def uninstall(self, bundle_id: str) -> bool:
"""
Uninstall an app.
Args:
bundle_id: App bundle identifier
Returns:
Success status
"""
cmd = build_simctl_command("uninstall", self.udid, bundle_id)
try:
subprocess.run(cmd, capture_output=True, check=True)
return True
except subprocess.CalledProcessError:
return False
def open_url(self, url: str) -> bool:
"""
Open URL (for deep linking).
Args:
url: URL to open (http://, myapp://, etc.)
Returns:
Success status
"""
cmd = build_simctl_command("openurl", self.udid, url)
try:
subprocess.run(cmd, capture_output=True, check=True)
return True
except subprocess.CalledProcessError:
return False
def list_apps(self) -> list[dict[str, str]]:
"""
List installed apps.
Returns:
List of app info dictionaries
"""
cmd = build_simctl_command("listapps", self.udid)
try:
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
# Parse plist output using plutil to convert to JSON
plist_data = result.stdout
# Use plutil to convert plist to JSON
convert_cmd = ["plutil", "-convert", "json", "-o", "-", "-"]
convert_result = subprocess.run(
convert_cmd, check=False, input=plist_data, capture_output=True, text=True
)
apps = []
if convert_result.returncode == 0:
import json
try:
data = json.loads(convert_result.stdout)
for bundle_id, app_info in data.items():
# Skip system internal apps that are hidden
if app_info.get("ApplicationType") == "Hidden":
continue
apps.append(
{
"bundle_id": bundle_id,
"name": app_info.get(
"CFBundleDisplayName", app_info.get("CFBundleName", bundle_id)
),
"path": app_info.get("Path", ""),
"version": app_info.get("CFBundleVersion", "Unknown"),
"type": app_info.get("ApplicationType", "User"),
}
)
except json.JSONDecodeError:
pass
return apps
except subprocess.CalledProcessError:
return []
def get_app_state(self, bundle_id: str) -> str:
"""
Get app state (running, suspended, etc.).
Args:
bundle_id: App bundle identifier
Returns:
State string or 'unknown'
"""
# Check if app is running by trying to get its PID
cmd = build_simctl_command("spawn", self.udid, "launchctl", "list")
try:
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
if bundle_id in result.stdout:
return "running"
return "not running"
except subprocess.CalledProcessError:
return "unknown"
def restart_app(self, bundle_id: str, delay: float = 1.0) -> bool:
"""
Restart an app (terminate then launch).
Args:
bundle_id: App bundle identifier
delay: Delay between terminate and launch
Returns:
Success status
"""
# Terminate
self.terminate(bundle_id)
time.sleep(delay)
# Launch
success, _ = self.launch(bundle_id)
return success
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description="Control iOS app lifecycle")
# Actions
parser.add_argument("--launch", help="Launch app by bundle ID")
parser.add_argument("--terminate", help="Terminate app by bundle ID")
parser.add_argument("--restart", help="Restart app by bundle ID")
parser.add_argument("--install", help="Install app from .app path")
parser.add_argument("--uninstall", help="Uninstall app by bundle ID")
parser.add_argument("--open-url", help="Open URL (deep link)")
parser.add_argument("--list", action="store_true", help="List installed apps")
parser.add_argument("--state", help="Get app state by bundle ID")
# Options
parser.add_argument(
"--wait-for-debugger", action="store_true", help="Wait for debugger when launching"
)
parser.add_argument(
"--udid",
help="Device UDID (auto-detects booted simulator if not provided)",
)
args = parser.parse_args()
# Resolve UDID with auto-detection
try:
udid = resolve_udid(args.udid)
except RuntimeError as e:
print(f"Error: {e}")
sys.exit(1)
launcher = AppLauncher(udid=udid)
# Execute requested action
if args.launch:
success, pid = launcher.launch(args.launch, args.wait_for_debugger)
if success:
if pid:
print(f"Launched {args.launch} (PID: {pid})")
else:
print(f"Launched {args.launch}")
else:
print(f"Failed to launch {args.launch}")
sys.exit(1)
elif args.terminate:
if launcher.terminate(args.terminate):
print(f"Terminated {args.terminate}")
else:
print(f"Failed to terminate {args.terminate}")
sys.exit(1)
elif args.restart:
if launcher.restart_app(args.restart):
print(f"Restarted {args.restart}")
else:
print(f"Failed to restart {args.restart}")
sys.exit(1)
elif args.install:
if launcher.install(args.install):
print(f"Installed {args.install}")
else:
print(f"Failed to install {args.install}")
sys.exit(1)
elif args.uninstall:
if launcher.uninstall(args.uninstall):
print(f"Uninstalled {args.uninstall}")
else:
print(f"Failed to uninstall {args.uninstall}")
sys.exit(1)
elif args.open_url:
if launcher.open_url(args.open_url):
print(f"Opened URL: {args.open_url}")
else:
print(f"Failed to open URL: {args.open_url}")
sys.exit(1)
elif args.list:
apps = launcher.list_apps()
if apps:
print(f"Installed apps ({len(apps)}):")
for app in apps[:10]: # Limit for token efficiency
print(f" {app['bundle_id']}: {app['name']} (v{app['version']})")
if len(apps) > 10:
print(f" ... and {len(apps) - 10} more")
else:
print("No apps found or failed to list")
elif args.state:
state = launcher.get_app_state(args.state)
print(f"{args.state}: {state}")
else:
parser.print_help()
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,391 @@
#!/usr/bin/env python3
"""
App State Capture for iOS Simulator
Captures complete app state including screenshot, accessibility tree, and logs.
Optimized for minimal token output.
Usage: python scripts/app_state_capture.py [options]
"""
import argparse
import json
import subprocess
import sys
from datetime import datetime
from pathlib import Path
from common import (
capture_screenshot,
count_elements,
get_accessibility_tree,
resolve_udid,
)
class AppStateCapture:
"""Captures comprehensive app state for debugging."""
def __init__(
self,
app_bundle_id: str | None = None,
udid: str | None = None,
inline: bool = False,
screenshot_size: str = "half",
):
"""
Initialize state capture.
Args:
app_bundle_id: Optional app bundle ID for log filtering
udid: Optional device UDID (uses booted if not specified)
inline: If True, return screenshots as base64 (for vision-based automation)
screenshot_size: 'full', 'half', 'quarter', 'thumb' (default: 'half')
"""
self.app_bundle_id = app_bundle_id
self.udid = udid
self.inline = inline
self.screenshot_size = screenshot_size
def capture_screenshot(self, output_path: Path) -> bool:
"""Capture screenshot of current screen."""
cmd = ["xcrun", "simctl", "io"]
if self.udid:
cmd.append(self.udid)
else:
cmd.append("booted")
cmd.extend(["screenshot", str(output_path)])
try:
subprocess.run(cmd, capture_output=True, check=True)
return True
except subprocess.CalledProcessError:
return False
def capture_accessibility_tree(self, output_path: Path) -> dict:
"""Capture accessibility tree using shared utility."""
try:
# Use shared utility to fetch tree
tree = get_accessibility_tree(self.udid, nested=True)
# Save tree
with open(output_path, "w") as f:
json.dump(tree, f, indent=2)
# Return summary using shared utility
return {"captured": True, "element_count": count_elements(tree)}
except Exception as e:
return {"captured": False, "error": str(e)}
def capture_logs(self, output_path: Path, line_limit: int = 100) -> dict:
"""Capture recent app logs."""
if not self.app_bundle_id:
# Can't capture logs without app ID
return {"captured": False, "reason": "No app bundle ID specified"}
# Get app name from bundle ID (simplified)
app_name = self.app_bundle_id.split(".")[-1]
cmd = ["xcrun", "simctl", "spawn"]
if self.udid:
cmd.append(self.udid)
else:
cmd.append("booted")
cmd.extend(
[
"log",
"show",
"--predicate",
f'process == "{app_name}"',
"--last",
"1m", # Last 1 minute
"--style",
"compact",
]
)
try:
result = subprocess.run(cmd, check=False, capture_output=True, text=True, timeout=5)
logs = result.stdout
# Limit lines for token efficiency
lines = logs.split("\n")
if len(lines) > line_limit:
lines = lines[-line_limit:]
# Save logs
with open(output_path, "w") as f:
f.write("\n".join(lines))
# Analyze for issues
warning_count = sum(1 for line in lines if "warning" in line.lower())
error_count = sum(1 for line in lines if "error" in line.lower())
return {
"captured": True,
"lines": len(lines),
"warnings": warning_count,
"errors": error_count,
}
except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as e:
return {"captured": False, "error": str(e)}
def capture_device_info(self) -> dict:
"""Get device information."""
cmd = ["xcrun", "simctl", "list", "devices", "booted"]
if self.udid:
# Specific device info
cmd = ["xcrun", "simctl", "list", "devices"]
try:
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
# Parse output for device info (simplified)
lines = result.stdout.split("\n")
device_info = {}
for line in lines:
if "iPhone" in line or "iPad" in line:
# Extract device name and state
parts = line.strip().split("(")
if parts:
device_info["name"] = parts[0].strip()
if len(parts) > 2:
device_info["udid"] = parts[1].replace(")", "").strip()
device_info["state"] = parts[2].replace(")", "").strip()
break
return device_info
except subprocess.CalledProcessError:
return {}
def capture_all(
self, output_dir: str, log_lines: int = 100, app_name: str | None = None
) -> dict:
"""
Capture complete app state.
Args:
output_dir: Directory to save artifacts
log_lines: Number of log lines to capture
app_name: App name for semantic naming (for inline mode)
Returns:
Summary of captured state
"""
# Create output directory (only if not in inline mode)
output_path = Path(output_dir)
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
if not self.inline:
capture_dir = output_path / f"app-state-{timestamp}"
capture_dir.mkdir(parents=True, exist_ok=True)
else:
capture_dir = None
summary = {
"timestamp": datetime.now().isoformat(),
"screenshot_mode": "inline" if self.inline else "file",
}
if capture_dir:
summary["output_dir"] = str(capture_dir)
# Capture screenshot using new unified utility
screenshot_result = capture_screenshot(
self.udid,
size=self.screenshot_size,
inline=self.inline,
app_name=app_name,
)
if self.inline:
# Inline mode: store base64
summary["screenshot"] = {
"mode": "inline",
"base64": screenshot_result["base64_data"],
"width": screenshot_result["width"],
"height": screenshot_result["height"],
"size_preset": self.screenshot_size,
}
else:
# File mode: save to disk
screenshot_path = capture_dir / "screenshot.png"
# Move temp file to target location
import shutil
shutil.move(screenshot_result["file_path"], screenshot_path)
summary["screenshot"] = {
"mode": "file",
"file": "screenshot.png",
"size_bytes": screenshot_result["size_bytes"],
}
# Capture accessibility tree
if not self.inline or capture_dir:
accessibility_path = (capture_dir or output_path) / "accessibility-tree.json"
else:
accessibility_path = None
if accessibility_path:
tree_info = self.capture_accessibility_tree(accessibility_path)
summary["accessibility"] = tree_info
# Capture logs (if app ID provided)
if self.app_bundle_id:
if not self.inline or capture_dir:
logs_path = (capture_dir or output_path) / "app-logs.txt"
else:
logs_path = None
if logs_path:
log_info = self.capture_logs(logs_path, log_lines)
summary["logs"] = log_info
# Get device info
device_info = self.capture_device_info()
if device_info:
summary["device"] = device_info
# Save device info (file mode only)
if capture_dir:
with open(capture_dir / "device-info.json", "w") as f:
json.dump(device_info, f, indent=2)
# Save summary (file mode only)
if capture_dir:
with open(capture_dir / "summary.json", "w") as f:
json.dump(summary, f, indent=2)
# Create markdown summary
self._create_summary_md(capture_dir, summary)
return summary
def _create_summary_md(self, capture_dir: Path, summary: dict) -> None:
"""Create markdown summary file."""
md_path = capture_dir / "summary.md"
with open(md_path, "w") as f:
f.write("# App State Capture\n\n")
f.write(f"**Timestamp:** {summary['timestamp']}\n\n")
if "device" in summary:
f.write("## Device\n")
device = summary["device"]
f.write(f"- Name: {device.get('name', 'Unknown')}\n")
f.write(f"- UDID: {device.get('udid', 'N/A')}\n")
f.write(f"- State: {device.get('state', 'Unknown')}\n\n")
f.write("## Screenshot\n")
f.write("![Current Screen](screenshot.png)\n\n")
if "accessibility" in summary:
acc = summary["accessibility"]
f.write("## Accessibility\n")
if acc.get("captured"):
f.write(f"- Elements: {acc.get('element_count', 0)}\n")
else:
f.write(f"- Error: {acc.get('error', 'Unknown')}\n")
f.write("\n")
if "logs" in summary:
logs = summary["logs"]
f.write("## Logs\n")
if logs.get("captured"):
f.write(f"- Lines: {logs.get('lines', 0)}\n")
f.write(f"- Warnings: {logs.get('warnings', 0)}\n")
f.write(f"- Errors: {logs.get('errors', 0)}\n")
else:
f.write(f"- {logs.get('reason', logs.get('error', 'Not captured'))}\n")
f.write("\n")
f.write("## Files\n")
f.write("- `screenshot.png` - Current screen\n")
f.write("- `accessibility-tree.json` - Full UI hierarchy\n")
if self.app_bundle_id:
f.write("- `app-logs.txt` - Recent app logs\n")
f.write("- `device-info.json` - Device details\n")
f.write("- `summary.json` - Complete capture metadata\n")
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description="Capture complete app state for debugging")
parser.add_argument(
"--app-bundle-id", help="App bundle ID for log filtering (e.g., com.example.app)"
)
parser.add_argument(
"--output", default=".", help="Output directory (default: current directory)"
)
parser.add_argument(
"--log-lines", type=int, default=100, help="Number of log lines to capture (default: 100)"
)
parser.add_argument(
"--udid",
help="Device UDID (auto-detects booted simulator if not provided)",
)
parser.add_argument(
"--inline",
action="store_true",
help="Return screenshots as base64 (inline mode for vision-based automation)",
)
parser.add_argument(
"--size",
choices=["full", "half", "quarter", "thumb"],
default="half",
help="Screenshot size for token optimization (default: half)",
)
parser.add_argument("--app-name", help="App name for semantic screenshot naming")
args = parser.parse_args()
# Resolve UDID with auto-detection
try:
udid = resolve_udid(args.udid)
except RuntimeError as e:
print(f"Error: {e}")
sys.exit(1)
# Create capturer
capturer = AppStateCapture(
app_bundle_id=args.app_bundle_id,
udid=udid,
inline=args.inline,
screenshot_size=args.size,
)
# Capture state
try:
summary = capturer.capture_all(
output_dir=args.output, log_lines=args.log_lines, app_name=args.app_name
)
# Token-efficient output
if "output_dir" in summary:
print(f"State captured: {summary['output_dir']}/")
else:
# Inline mode
print(
f"State captured (inline mode): {summary['screenshot']['width']}x{summary['screenshot']['height']}"
)
# Report any issues found
if "logs" in summary and summary["logs"].get("captured"):
logs = summary["logs"]
if logs["errors"] > 0 or logs["warnings"] > 0:
print(f"Issues found: {logs['errors']} errors, {logs['warnings']} warnings")
if "accessibility" in summary and summary["accessibility"].get("captured"):
print(f"Elements: {summary['accessibility']['element_count']}")
except Exception as e:
print(f"Error: {e}")
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,310 @@
#!/usr/bin/env python3
"""
Build and Test Automation for Xcode Projects
Ultra token-efficient build automation with progressive disclosure via xcresult bundles.
Features:
- Minimal default output (5-10 tokens)
- Progressive disclosure for error/warning/log details
- Native xcresult bundle support
- Clean modular architecture
Usage Examples:
# Build (minimal output)
python scripts/build_and_test.py --project MyApp.xcodeproj
# Output: Build: SUCCESS (0 errors, 3 warnings) [xcresult-20251018-143052]
# Get error details
python scripts/build_and_test.py --get-errors xcresult-20251018-143052
# Get warnings
python scripts/build_and_test.py --get-warnings xcresult-20251018-143052
# Get build log
python scripts/build_and_test.py --get-log xcresult-20251018-143052
# Get everything as JSON
python scripts/build_and_test.py --get-all xcresult-20251018-143052 --json
# List recent builds
python scripts/build_and_test.py --list-xcresults
# Verbose mode (for debugging)
python scripts/build_and_test.py --project MyApp.xcodeproj --verbose
"""
import argparse
import sys
from pathlib import Path
# Import our modular components
from xcode import BuildRunner, OutputFormatter, XCResultCache, XCResultParser
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(
description="Build and test Xcode projects with progressive disclosure",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Build project (minimal output)
python scripts/build_and_test.py --project MyApp.xcodeproj
# Run tests
python scripts/build_and_test.py --project MyApp.xcodeproj --test
# Get error details from previous build
python scripts/build_and_test.py --get-errors xcresult-20251018-143052
# Get all details as JSON
python scripts/build_and_test.py --get-all xcresult-20251018-143052 --json
# List recent builds
python scripts/build_and_test.py --list-xcresults
""",
)
# Build/test mode arguments
build_group = parser.add_argument_group("Build/Test Options")
project_group = build_group.add_mutually_exclusive_group()
project_group.add_argument("--project", help="Path to .xcodeproj file")
project_group.add_argument("--workspace", help="Path to .xcworkspace file")
build_group.add_argument("--scheme", help="Build scheme (auto-detected if not specified)")
build_group.add_argument(
"--configuration",
default="Debug",
choices=["Debug", "Release"],
help="Build configuration (default: Debug)",
)
build_group.add_argument("--simulator", help="Simulator name (default: iPhone 15)")
build_group.add_argument("--clean", action="store_true", help="Clean before building")
build_group.add_argument("--test", action="store_true", help="Run tests")
build_group.add_argument("--suite", help="Specific test suite to run")
# Progressive disclosure arguments
disclosure_group = parser.add_argument_group("Progressive Disclosure Options")
disclosure_group.add_argument(
"--get-errors", metavar="XCRESULT_ID", help="Get error details from xcresult"
)
disclosure_group.add_argument(
"--get-warnings", metavar="XCRESULT_ID", help="Get warning details from xcresult"
)
disclosure_group.add_argument(
"--get-log", metavar="XCRESULT_ID", help="Get build log from xcresult"
)
disclosure_group.add_argument(
"--get-all", metavar="XCRESULT_ID", help="Get all details from xcresult"
)
disclosure_group.add_argument(
"--list-xcresults", action="store_true", help="List recent xcresult bundles"
)
# Output options
output_group = parser.add_argument_group("Output Options")
output_group.add_argument("--verbose", action="store_true", help="Show detailed output")
output_group.add_argument("--json", action="store_true", help="Output as JSON")
args = parser.parse_args()
# Initialize cache
cache = XCResultCache()
# Handle list mode
if args.list_xcresults:
xcresults = cache.list()
if args.json:
import json
print(json.dumps(xcresults, indent=2))
elif not xcresults:
print("No xcresult bundles found")
else:
print(f"Recent XCResult bundles ({len(xcresults)}):")
print()
for xc in xcresults:
print(f" {xc['id']}")
print(f" Created: {xc['created']}")
print(f" Size: {xc['size_mb']} MB")
print()
return 0
# Handle retrieval modes
xcresult_id = args.get_errors or args.get_warnings or args.get_log or args.get_all
if xcresult_id:
xcresult_path = cache.get_path(xcresult_id)
if not xcresult_path or not xcresult_path.exists():
print(f"Error: XCResult bundle not found: {xcresult_id}", file=sys.stderr)
print("Use --list-xcresults to see available bundles", file=sys.stderr)
return 1
# Load cached stderr for progressive disclosure
cached_stderr = cache.get_stderr(xcresult_id)
parser = XCResultParser(xcresult_path, stderr=cached_stderr)
# Get errors
if args.get_errors:
errors = parser.get_errors()
if args.json:
import json
print(json.dumps(errors, indent=2))
else:
print(OutputFormatter.format_errors(errors))
return 0
# Get warnings
if args.get_warnings:
warnings = parser.get_warnings()
if args.json:
import json
print(json.dumps(warnings, indent=2))
else:
print(OutputFormatter.format_warnings(warnings))
return 0
# Get log
if args.get_log:
log = parser.get_build_log()
if log:
print(OutputFormatter.format_log(log))
else:
print("No build log available", file=sys.stderr)
return 1
return 0
# Get all
if args.get_all:
error_count, warning_count = parser.count_issues()
errors = parser.get_errors()
warnings = parser.get_warnings()
build_log = parser.get_build_log()
if args.json:
import json
data = {
"xcresult_id": xcresult_id,
"error_count": error_count,
"warning_count": warning_count,
"errors": errors,
"warnings": warnings,
"log_preview": build_log[:1000] if build_log else None,
}
print(json.dumps(data, indent=2))
else:
print(f"XCResult: {xcresult_id}")
print(f"Errors: {error_count}, Warnings: {warning_count}")
print()
if errors:
print(OutputFormatter.format_errors(errors, limit=10))
print()
if warnings:
print(OutputFormatter.format_warnings(warnings, limit=10))
print()
if build_log:
print("Build Log (last 30 lines):")
print(OutputFormatter.format_log(build_log, lines=30))
return 0
# Build/test mode
if not args.project and not args.workspace:
# Try to auto-detect in current directory
cwd = Path.cwd()
projects = list(cwd.glob("*.xcodeproj"))
workspaces = list(cwd.glob("*.xcworkspace"))
if workspaces:
args.workspace = str(workspaces[0])
elif projects:
args.project = str(projects[0])
else:
parser.error("No project or workspace specified and none found in current directory")
# Initialize builder
builder = BuildRunner(
project_path=args.project,
workspace_path=args.workspace,
scheme=args.scheme,
configuration=args.configuration,
simulator=args.simulator,
cache=cache,
)
# Execute build or test
if args.test:
success, xcresult_id, stderr = builder.test(test_suite=args.suite)
else:
success, xcresult_id, stderr = builder.build(clean=args.clean)
if not xcresult_id and not stderr:
print("Error: Build/test failed without creating xcresult or error output", file=sys.stderr)
return 1
# Save stderr to cache for progressive disclosure
if xcresult_id and stderr:
cache.save_stderr(xcresult_id, stderr)
# Parse results
xcresult_path = cache.get_path(xcresult_id) if xcresult_id else None
parser = XCResultParser(xcresult_path, stderr=stderr)
error_count, warning_count = parser.count_issues()
# Format output
status = "SUCCESS" if success else "FAILED"
# Generate hints for failed builds
hints = None
if not success:
errors = parser.get_errors()
hints = OutputFormatter.generate_hints(errors)
if args.verbose:
# Verbose mode with error/warning details
errors = parser.get_errors() if error_count > 0 else None
warnings = parser.get_warnings() if warning_count > 0 else None
output = OutputFormatter.format_verbose(
status=status,
error_count=error_count,
warning_count=warning_count,
xcresult_id=xcresult_id or "N/A",
errors=errors,
warnings=warnings,
)
print(output)
elif args.json:
# JSON mode
data = {
"success": success,
"xcresult_id": xcresult_id or None,
"error_count": error_count,
"warning_count": warning_count,
}
if hints:
data["hints"] = hints
import json
print(json.dumps(data, indent=2))
else:
# Minimal mode (default)
output = OutputFormatter.format_minimal(
status=status,
error_count=error_count,
warning_count=warning_count,
xcresult_id=xcresult_id or "N/A",
hints=hints,
)
print(output)
# Exit with appropriate code
return 0 if success else 1
if __name__ == "__main__":
sys.exit(main())

View File

@@ -0,0 +1,103 @@
#!/usr/bin/env python3
"""
iOS Simulator Clipboard Manager
Copy text to simulator clipboard for testing paste flows.
Optimized for minimal token output.
Usage: python scripts/clipboard.py --copy "text to copy"
"""
import argparse
import subprocess
import sys
from common import resolve_udid
class ClipboardManager:
"""Manages clipboard operations on iOS simulator."""
def __init__(self, udid: str | None = None):
"""Initialize clipboard manager.
Args:
udid: Optional device UDID (auto-detects booted simulator if None)
"""
self.udid = udid
def copy(self, text: str) -> bool:
"""
Copy text to simulator clipboard.
Args:
text: Text to copy to clipboard
Returns:
Success status
"""
cmd = ["xcrun", "simctl", "pbcopy"]
if self.udid:
cmd.append(self.udid)
else:
cmd.append("booted")
cmd.append(text)
try:
subprocess.run(cmd, capture_output=True, check=True)
return True
except subprocess.CalledProcessError:
return False
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description="Copy text to iOS simulator clipboard")
parser.add_argument("--copy", required=True, help="Text to copy to clipboard")
parser.add_argument(
"--udid",
help="Device UDID (auto-detects booted simulator if not provided)",
)
parser.add_argument("--test-name", help="Test scenario name for tracking")
parser.add_argument("--expected", help="Expected behavior after paste")
args = parser.parse_args()
# Resolve UDID with auto-detection
try:
udid = resolve_udid(args.udid)
except RuntimeError as e:
print(f"Error: {e}")
sys.exit(1)
# Create manager and copy text
manager = ClipboardManager(udid=udid)
if manager.copy(args.copy):
# Token-efficient output
output = f'Copied: "{args.copy}"'
if args.test_name:
output += f" (test: {args.test_name})"
print(output)
# Provide usage guidance
if args.expected:
print(f"Expected: {args.expected}")
print()
print("Next steps:")
print("1. Tap text field with: python scripts/navigator.py --find-type TextField --tap")
print("2. Paste with: python scripts/keyboard.py --key return")
print(" Or use Cmd+V gesture with: python scripts/keyboard.py --key cmd+v")
else:
print("Failed to copy text to clipboard")
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,59 @@
"""
Common utilities shared across iOS simulator scripts.
This module centralizes genuinely reused code patterns to eliminate duplication
while respecting Jackson's Law - no over-abstraction, only truly shared logic.
Organization:
- device_utils: Device detection, command building, coordinate transformation
- idb_utils: IDB-specific operations (accessibility tree, element manipulation)
- cache_utils: Progressive disclosure caching for large outputs
- screenshot_utils: Screenshot capture with file and inline modes
"""
from .cache_utils import ProgressiveCache, get_cache
from .device_utils import (
build_idb_command,
build_simctl_command,
get_booted_device_udid,
get_device_screen_size,
resolve_udid,
transform_screenshot_coords,
)
from .idb_utils import (
count_elements,
flatten_tree,
get_accessibility_tree,
get_screen_size,
)
from .screenshot_utils import (
capture_screenshot,
format_screenshot_result,
generate_screenshot_name,
get_size_preset,
resize_screenshot,
)
__all__ = [
# cache_utils
"ProgressiveCache",
# device_utils
"build_idb_command",
"build_simctl_command",
# screenshot_utils
"capture_screenshot",
# idb_utils
"count_elements",
"flatten_tree",
"format_screenshot_result",
"generate_screenshot_name",
"get_accessibility_tree",
"get_booted_device_udid",
"get_cache",
"get_device_screen_size",
"get_screen_size",
"get_size_preset",
"resize_screenshot",
"resolve_udid",
"transform_screenshot_coords",
]

View File

@@ -0,0 +1,260 @@
#!/usr/bin/env python3
"""
Progressive disclosure cache for large outputs.
Implements cache system to support progressive disclosure pattern:
- Return concise summary with cache_id for large outputs
- User retrieves full details on demand via cache_id
- Reduces token usage by 96% for common queries
Cache directory: ~/.ios-simulator-skill/cache/
Cache expiration: Configurable per cache type (default 1 hour)
Used by:
- sim_list.py - Simulator listing progressive disclosure
- Future: build logs, UI trees, etc.
"""
import json
import time
from datetime import datetime, timedelta
from pathlib import Path
from typing import Any
class ProgressiveCache:
"""Cache for progressive disclosure pattern.
Stores large outputs with timestamped IDs for on-demand retrieval.
Automatically cleans up expired entries.
"""
def __init__(self, cache_dir: str | None = None, max_age_hours: int = 1):
"""Initialize cache system.
Args:
cache_dir: Cache directory path (default: ~/.ios-simulator-skill/cache/)
max_age_hours: Max age for cache entries before expiration (default: 1 hour)
"""
if cache_dir is None:
cache_dir = str(Path("~/.ios-simulator-skill/cache").expanduser())
self.cache_dir = Path(cache_dir)
self.max_age_hours = max_age_hours
# Create cache directory if needed
self.cache_dir.mkdir(parents=True, exist_ok=True)
def save(self, data: dict[str, Any], cache_type: str) -> str:
"""Save data to cache and return cache_id.
Args:
data: Dictionary data to cache
cache_type: Type of cache ('simulator-list', 'build-log', 'ui-tree', etc.)
Returns:
Cache ID like 'sim-20251028-143052' for use in progressive disclosure
Example:
cache_id = cache.save({'devices': [...]}, 'simulator-list')
# Returns: 'sim-20251028-143052'
"""
# Generate cache_id with timestamp
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
cache_prefix = cache_type.split("-")[0] # e.g., 'sim' from 'simulator-list'
cache_id = f"{cache_prefix}-{timestamp}"
# Save to file
cache_file = self.cache_dir / f"{cache_id}.json"
with open(cache_file, "w") as f:
json.dump(
{
"cache_id": cache_id,
"cache_type": cache_type,
"created_at": datetime.now().isoformat(),
"data": data,
},
f,
indent=2,
)
return cache_id
def get(self, cache_id: str) -> dict[str, Any] | None:
"""Retrieve data from cache by cache_id.
Args:
cache_id: Cache ID from save() or list_entries()
Returns:
Cached data dictionary, or None if not found/expired
Example:
data = cache.get('sim-20251028-143052')
if data:
print(f"Found {len(data)} devices")
"""
cache_file = self.cache_dir / f"{cache_id}.json"
if not cache_file.exists():
return None
# Check if expired
if self._is_expired(cache_file):
cache_file.unlink() # Delete expired file
return None
try:
with open(cache_file) as f:
entry = json.load(f)
return entry.get("data")
except (OSError, json.JSONDecodeError):
return None
def list_entries(self, cache_type: str | None = None) -> list[dict[str, Any]]:
"""List available cache entries with metadata.
Args:
cache_type: Filter by type (e.g., 'simulator-list'), or None for all
Returns:
List of cache entries with id, type, created_at, age_seconds
Example:
entries = cache.list_entries('simulator-list')
for entry in entries:
print(f"{entry['id']} - {entry['age_seconds']}s old")
"""
entries = []
for cache_file in sorted(self.cache_dir.glob("*.json"), reverse=True):
# Check if expired
if self._is_expired(cache_file):
cache_file.unlink()
continue
try:
with open(cache_file) as f:
entry = json.load(f)
# Filter by type if specified
if cache_type and entry.get("cache_type") != cache_type:
continue
created_at = datetime.fromisoformat(entry.get("created_at", ""))
age_seconds = (datetime.now() - created_at).total_seconds()
entries.append(
{
"id": entry.get("cache_id"),
"type": entry.get("cache_type"),
"created_at": entry.get("created_at"),
"age_seconds": int(age_seconds),
}
)
except (OSError, json.JSONDecodeError, ValueError):
continue
return entries
def cleanup(self, max_age_hours: int | None = None) -> int:
"""Remove expired cache entries.
Args:
max_age_hours: Age threshold (default: uses instance max_age_hours)
Returns:
Number of entries deleted
Example:
deleted = cache.cleanup()
print(f"Deleted {deleted} expired cache entries")
"""
if max_age_hours is None:
max_age_hours = self.max_age_hours
deleted = 0
for cache_file in self.cache_dir.glob("*.json"):
if self._is_expired(cache_file, max_age_hours):
cache_file.unlink()
deleted += 1
return deleted
def clear(self, cache_type: str | None = None) -> int:
"""Clear all cache entries of a type.
Args:
cache_type: Type to clear (e.g., 'simulator-list'), or None to clear all
Returns:
Number of entries deleted
Example:
cleared = cache.clear('simulator-list')
print(f"Cleared {cleared} simulator list entries")
"""
deleted = 0
for cache_file in self.cache_dir.glob("*.json"):
if cache_type is None:
# Clear all
cache_file.unlink()
deleted += 1
else:
# Clear by type
try:
with open(cache_file) as f:
entry = json.load(f)
if entry.get("cache_type") == cache_type:
cache_file.unlink()
deleted += 1
except (OSError, json.JSONDecodeError):
pass
return deleted
def _is_expired(self, cache_file: Path, max_age_hours: int | None = None) -> bool:
"""Check if cache file is expired.
Args:
cache_file: Path to cache file
max_age_hours: Age threshold (default: uses instance max_age_hours)
Returns:
True if file is older than max_age_hours
"""
if max_age_hours is None:
max_age_hours = self.max_age_hours
try:
with open(cache_file) as f:
entry = json.load(f)
created_at = datetime.fromisoformat(entry.get("created_at", ""))
age = datetime.now() - created_at
return age > timedelta(hours=max_age_hours)
except (OSError, json.JSONDecodeError, ValueError):
return True
# Module-level cache instances (lazy-loaded)
_cache_instances: dict[str, ProgressiveCache] = {}
def get_cache(cache_dir: str | None = None) -> ProgressiveCache:
"""Get or create global cache instance.
Args:
cache_dir: Custom cache directory (uses default if None)
Returns:
ProgressiveCache instance
"""
# Use cache_dir as key, or 'default' if None
key = cache_dir or "default"
if key not in _cache_instances:
_cache_instances[key] = ProgressiveCache(cache_dir)
return _cache_instances[key]

View File

@@ -0,0 +1,432 @@
#!/usr/bin/env python3
"""
Shared device and simulator utilities.
Common patterns for interacting with simulators via xcrun simctl and IDB.
Standardizes command building and device targeting to prevent errors.
Follows Jackson's Law - only extracts genuinely reused patterns.
Used by:
- app_launcher.py (8 call sites) - App lifecycle commands
- Multiple scripts (15+ locations) - IDB command building
- navigator.py, gesture.py - Coordinate transformation
- test_recorder.py, app_state_capture.py - Auto-UDID detection
"""
import json
import re
import subprocess
def build_simctl_command(
operation: str,
udid: str | None = None,
*args,
) -> list[str]:
"""
Build xcrun simctl command with proper device handling.
Standardizes command building to prevent device targeting bugs.
Automatically uses "booted" if no UDID provided.
Used by:
- app_launcher.py: launch, terminate, install, uninstall, openurl, listapps, spawn
- Multiple scripts: generic simctl operations
Args:
operation: simctl operation (launch, terminate, install, etc.)
udid: Device UDID (uses 'booted' if None)
*args: Additional command arguments
Returns:
Complete command list ready for subprocess.run()
Examples:
# Launch app on booted simulator
cmd = build_simctl_command("launch", None, "com.app.bundle")
# Returns: ["xcrun", "simctl", "launch", "booted", "com.app.bundle"]
# Launch on specific device
cmd = build_simctl_command("launch", "ABC123", "com.app.bundle")
# Returns: ["xcrun", "simctl", "launch", "ABC123", "com.app.bundle"]
# Install app on specific device
cmd = build_simctl_command("install", "ABC123", "/path/to/app.app")
# Returns: ["xcrun", "simctl", "install", "ABC123", "/path/to/app.app"]
"""
cmd = ["xcrun", "simctl", operation]
# Add device (booted or specific UDID)
cmd.append(udid if udid else "booted")
# Add remaining arguments
cmd.extend(str(arg) for arg in args)
return cmd
def build_idb_command(
operation: str,
udid: str | None = None,
*args,
) -> list[str]:
"""
Build IDB command with proper device targeting.
Standardizes IDB command building across all scripts using IDB.
Handles device UDID consistently.
Used by:
- navigator.py: ui tap, ui text, ui describe-all
- gesture.py: ui swipe, ui tap
- keyboard.py: ui key, ui text, ui tap
- And more: 15+ locations
Args:
operation: IDB operation path (e.g., "ui tap", "ui text", "ui describe-all")
udid: Device UDID (omits --udid flag if None, IDB uses booted by default)
*args: Additional command arguments
Returns:
Complete command list ready for subprocess.run()
Examples:
# Tap on booted simulator
cmd = build_idb_command("ui tap", None, "200", "400")
# Returns: ["idb", "ui", "tap", "200", "400"]
# Tap on specific device
cmd = build_idb_command("ui tap", "ABC123", "200", "400")
# Returns: ["idb", "ui", "tap", "200", "400", "--udid", "ABC123"]
# Get accessibility tree
cmd = build_idb_command("ui describe-all", "ABC123", "--json", "--nested")
# Returns: ["idb", "ui", "describe-all", "--json", "--nested", "--udid", "ABC123"]
# Enter text
cmd = build_idb_command("ui text", None, "hello world")
# Returns: ["idb", "ui", "text", "hello world"]
"""
# Split operation into parts (e.g., "ui tap" -> ["ui", "tap"])
cmd = ["idb"] + operation.split()
# Add arguments
cmd.extend(str(arg) for arg in args)
# Add device targeting if specified (optional for IDB, uses booted by default)
if udid:
cmd.extend(["--udid", udid])
return cmd
def get_booted_device_udid() -> str | None:
"""
Auto-detect currently booted simulator UDID.
Queries xcrun simctl for booted devices and returns first match.
Returns:
UDID of booted simulator, or None if no simulator is booted.
Example:
udid = get_booted_device_udid()
if udid:
print(f"Booted simulator: {udid}")
else:
print("No simulator is currently booted")
"""
try:
result = subprocess.run(
["xcrun", "simctl", "list", "devices", "booted"],
capture_output=True,
text=True,
check=True,
)
# Parse output to find UDID
# Format: " iPhone 16 Pro (ABC123-DEF456) (Booted)"
for line in result.stdout.split("\n"):
# Look for UUID pattern in parentheses
match = re.search(r"\(([A-F0-9\-]{36})\)", line)
if match:
return match.group(1)
return None
except subprocess.CalledProcessError:
return None
def resolve_udid(udid_arg: str | None) -> str:
"""
Resolve device UDID with auto-detection fallback.
If udid_arg is provided, returns it immediately.
If None, attempts to auto-detect booted simulator.
Raises error if neither is available.
Args:
udid_arg: Explicit UDID from command line, or None
Returns:
Valid UDID string
Raises:
RuntimeError: If no UDID provided and no booted simulator found
Example:
try:
udid = resolve_udid(args.udid) # args.udid might be None
print(f"Using device: {udid}")
except RuntimeError as e:
print(f"Error: {e}")
sys.exit(1)
"""
if udid_arg:
return udid_arg
booted_udid = get_booted_device_udid()
if booted_udid:
return booted_udid
raise RuntimeError(
"No device UDID provided and no simulator is currently booted.\n"
"Boot a simulator or provide --udid explicitly:\n"
" xcrun simctl boot <device-name>\n"
" python scripts/script_name.py --udid <device-udid>"
)
def get_device_screen_size(udid: str) -> tuple[int, int]:
"""
Get actual screen dimensions for device via accessibility tree.
Queries IDB accessibility tree to determine actual device resolution.
Falls back to iPhone 14 defaults (390x844) if detection fails.
Args:
udid: Device UDID
Returns:
Tuple of (width, height) in pixels
Example:
width, height = get_device_screen_size("ABC123")
print(f"Device screen: {width}x{height}")
"""
try:
cmd = build_idb_command("ui describe-all", udid, "--json")
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
# Parse JSON response
data = json.loads(result.stdout)
tree = data[0] if isinstance(data, list) and len(data) > 0 else data
# Get frame size from root element
if tree and "frame" in tree:
frame = tree["frame"]
width = int(frame.get("width", 390))
height = int(frame.get("height", 844))
return (width, height)
# Fallback
return (390, 844)
except Exception:
# Graceful fallback to iPhone 14 Pro defaults
return (390, 844)
def resolve_device_identifier(identifier: str) -> str:
"""
Resolve device name or partial UDID to full UDID.
Supports multiple identifier formats:
- Full UDID: "ABC-123-DEF456..." (36 character UUID)
- Device name: "iPhone 16 Pro" (matches full name)
- Partial match: "iPhone 16" (matches first device containing this string)
- Special: "booted" (resolves to currently booted device)
Args:
identifier: Device UDID, name, or special value "booted"
Returns:
Full device UDID
Raises:
RuntimeError: If identifier cannot be resolved
Example:
udid = resolve_device_identifier("iPhone 16 Pro")
# Returns: "ABC123DEF456..."
udid = resolve_device_identifier("booted")
# Returns UDID of booted simulator
"""
# Handle "booted" special case
if identifier.lower() == "booted":
booted = get_booted_device_udid()
if booted:
return booted
raise RuntimeError(
"No simulator is currently booted. "
"Boot a simulator first: xcrun simctl boot <device-udid>"
)
# Check if already a full UDID (36 character UUID format)
if re.match(r"^[A-F0-9\-]{36}$", identifier, re.IGNORECASE):
return identifier.upper()
# Try to match by device name
simulators = list_simulators(state=None)
exact_matches = [s for s in simulators if s["name"].lower() == identifier.lower()]
if exact_matches:
return exact_matches[0]["udid"]
# Try partial match
partial_matches = [s for s in simulators if identifier.lower() in s["name"].lower()]
if partial_matches:
return partial_matches[0]["udid"]
# No match found
raise RuntimeError(
f"Device '{identifier}' not found. "
f"Use 'xcrun simctl list devices' to see available simulators."
)
def list_simulators(state: str | None = None) -> list[dict]:
"""
List iOS simulators with optional state filtering.
Queries xcrun simctl and returns structured list of simulators.
Optionally filters by state (available, booted, all).
Args:
state: Optional filter - "available", "booted", or None for all
Returns:
List of simulator dicts with keys:
- "name": Device name (e.g., "iPhone 16 Pro")
- "udid": Device UDID (36 char UUID)
- "state": Device state ("Booted", "Shutdown", "Unavailable")
- "runtime": iOS version (e.g., "iOS 18.0", "unavailable")
- "type": Device type ("iPhone", "iPad", "Apple Watch", etc.)
Example:
# List all simulators
all_sims = list_simulators()
print(f"Total simulators: {len(all_sims)}")
# List only available simulators
available = list_simulators(state="available")
for sim in available:
print(f"{sim['name']} ({sim['state']}) - {sim['udid']}")
# List only booted simulators
booted = list_simulators(state="booted")
for sim in booted:
print(f"Booted: {sim['name']}")
"""
try:
# Query simctl for device list
cmd = ["xcrun", "simctl", "list", "devices", "-j"]
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
data = json.loads(result.stdout)
simulators = []
# Parse JSON response
# Format: {"devices": {"iOS 18.0": [{...}, {...}], "iOS 17.0": [...], ...}}
for ios_version, devices in data.get("devices", {}).items():
for device in devices:
sim = {
"name": device.get("name", "Unknown"),
"udid": device.get("udid", ""),
"state": device.get("state", "Unknown"),
"runtime": ios_version,
"type": _extract_device_type(device.get("name", "")),
}
simulators.append(sim)
# Apply state filtering
if state == "booted":
return [s for s in simulators if s["state"] == "Booted"]
if state == "available":
return [s for s in simulators if s["state"] == "Shutdown"] # Available to boot
if state is None:
return simulators
return [s for s in simulators if s["state"].lower() == state.lower()]
except (subprocess.CalledProcessError, json.JSONDecodeError, KeyError) as e:
raise RuntimeError(f"Failed to list simulators: {e}") from e
def _extract_device_type(device_name: str) -> str:
"""
Extract device type from device name.
Parses device name to determine type (iPhone, iPad, Watch, etc.).
Args:
device_name: Full device name (e.g., "iPhone 16 Pro")
Returns:
Device type string
Example:
_extract_device_type("iPhone 16 Pro") # Returns "iPhone"
_extract_device_type("iPad Air") # Returns "iPad"
_extract_device_type("Apple Watch Series 9") # Returns "Watch"
"""
if "iPhone" in device_name:
return "iPhone"
if "iPad" in device_name:
return "iPad"
if "Watch" in device_name or "Apple Watch" in device_name:
return "Watch"
if "TV" in device_name or "Apple TV" in device_name:
return "TV"
return "Unknown"
def transform_screenshot_coords(
x: float,
y: float,
screenshot_width: int,
screenshot_height: int,
device_width: int,
device_height: int,
) -> tuple[int, int]:
"""
Transform screenshot coordinates to device coordinates.
Handles the case where a screenshot was downscaled (e.g., to 'half' size)
and needs to be transformed back to actual device pixel coordinates
for accurate tapping.
The transformation is linear:
device_x = (screenshot_x / screenshot_width) * device_width
device_y = (screenshot_y / screenshot_height) * device_height
Args:
x, y: Coordinates in the screenshot
screenshot_width, screenshot_height: Screenshot dimensions (e.g., 195, 422)
device_width, device_height: Actual device dimensions (e.g., 390, 844)
Returns:
Tuple of (device_x, device_y) in device pixels
Example:
# Screenshot taken at 'half' size: 195x422 (from 390x844 device)
device_x, device_y = transform_screenshot_coords(
100, 200, # Tap point in screenshot
195, 422, # Screenshot dimensions
390, 844 # Device dimensions
)
print(f"Tap at device coords: ({device_x}, {device_y})")
# Output: Tap at device coords: (200, 400)
"""
device_x = int((x / screenshot_width) * device_width)
device_y = int((y / screenshot_height) * device_height)
return (device_x, device_y)

View File

@@ -0,0 +1,180 @@
#!/usr/bin/env python3
"""
Shared IDB utility functions.
This module provides common IDB operations used across multiple scripts.
Follows Jackson's Law - only shared code that's truly reused, not speculative.
Used by:
- navigator.py - Accessibility tree navigation
- screen_mapper.py - UI element analysis
- accessibility_audit.py - WCAG compliance checking
- test_recorder.py - Test documentation
- app_state_capture.py - State snapshots
- gesture.py - Touch gesture operations
"""
import json
import subprocess
import sys
def get_accessibility_tree(udid: str | None = None, nested: bool = True) -> dict:
"""
Fetch accessibility tree from IDB.
The accessibility tree represents the complete UI hierarchy of the current
screen, with all element properties needed for semantic navigation.
Args:
udid: Device UDID (uses booted simulator if None)
nested: Include nested structure (default True). If False, returns flat array.
Returns:
Root element of accessibility tree as dict.
Structure: {
"type": "Window",
"AXLabel": "App Name",
"frame": {"x": 0, "y": 0, "width": 390, "height": 844},
"children": [...]
}
Raises:
SystemExit: If IDB command fails or returns invalid JSON
Example:
tree = get_accessibility_tree("UDID123")
# Root is Window element with all children nested
"""
cmd = ["idb", "ui", "describe-all", "--json"]
if nested:
cmd.append("--nested")
if udid:
cmd.extend(["--udid", udid])
try:
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
tree_data = json.loads(result.stdout)
# IDB returns array format, extract first element (root)
if isinstance(tree_data, list) and len(tree_data) > 0:
return tree_data[0]
return tree_data
except subprocess.CalledProcessError as e:
print(f"Error: Failed to get accessibility tree: {e.stderr}", file=sys.stderr)
sys.exit(1)
except json.JSONDecodeError:
print("Error: Invalid JSON from idb", file=sys.stderr)
sys.exit(1)
def flatten_tree(node: dict, depth: int = 0, elements: list[dict] | None = None) -> list[dict]:
"""
Flatten nested accessibility tree into list of elements.
Converts the hierarchical accessibility tree into a flat list where each
element includes its depth for context.
Used by:
- navigator.py - Element finding
- screen_mapper.py - Element analysis
- accessibility_audit.py - Audit scanning
Args:
node: Root node of tree (typically from get_accessibility_tree)
depth: Current depth (used internally, start at 0)
elements: Accumulator list (used internally, start as None)
Returns:
Flat list of elements, each with "depth" key indicating nesting level.
Structure of each element: {
"type": "Button",
"AXLabel": "Login",
"frame": {...},
"depth": 2,
...
}
Example:
tree = get_accessibility_tree()
flat = flatten_tree(tree)
for elem in flat:
print(f"{' ' * elem['depth']}{elem.get('type')}: {elem.get('AXLabel')}")
"""
if elements is None:
elements = []
# Add current node with depth tracking
node_copy = node.copy()
node_copy["depth"] = depth
elements.append(node_copy)
# Process children recursively
for child in node.get("children", []):
flatten_tree(child, depth + 1, elements)
return elements
def count_elements(node: dict) -> int:
"""
Count total elements in tree (recursive).
Traverses entire tree counting all elements for reporting purposes.
Used by:
- test_recorder.py - Element counting per step
- screen_mapper.py - Summary statistics
Args:
node: Root node of tree
Returns:
Total element count including root and all descendants
Example:
tree = get_accessibility_tree()
total = count_elements(tree)
print(f"Screen has {total} elements")
"""
count = 1
for child in node.get("children", []):
count += count_elements(child)
return count
def get_screen_size(udid: str | None = None) -> tuple[int, int]:
"""
Get screen dimensions from accessibility tree.
Extracts the screen size from the root element's frame. Useful for
gesture calculations and coordinate normalization.
Used by:
- gesture.py - Gesture positioning
- Potentially: screenshot positioning, screen-aware scaling
Args:
udid: Device UDID (uses booted if None)
Returns:
(width, height) tuple. Defaults to (390, 844) if detection fails
or tree cannot be accessed.
Example:
width, height = get_screen_size()
center_x = width // 2
center_y = height // 2
"""
DEFAULT_WIDTH = 390 # iPhone 14
DEFAULT_HEIGHT = 844
try:
tree = get_accessibility_tree(udid, nested=False)
frame = tree.get("frame", {})
width = int(frame.get("width", DEFAULT_WIDTH))
height = int(frame.get("height", DEFAULT_HEIGHT))
return (width, height)
except Exception:
# Silently fall back to defaults if tree access fails
return (DEFAULT_WIDTH, DEFAULT_HEIGHT)

View File

@@ -0,0 +1,338 @@
#!/usr/bin/env python3
"""
Screenshot utilities with dual-mode support.
Provides unified screenshot handling with:
- File-based mode: Persistent artifacts for test documentation
- Inline base64 mode: Vision-based automation for agent analysis
- Size presets: Token optimization (full/half/quarter/thumb)
- Semantic naming: {appName}_{screenName}_{state}_{timestamp}.png
Supports resize operations via PIL (optional dependency).
Used by:
- test_recorder.py - Step-based screenshot recording
- app_state_capture.py - State snapshot captures
"""
import base64
import os
import subprocess
import sys
from datetime import datetime
from pathlib import Path
from typing import Any
# Try to import PIL for resizing, but make it optional
try:
from PIL import Image
HAS_PIL = True
except ImportError:
HAS_PIL = False
def generate_screenshot_name(
app_name: str | None = None,
screen_name: str | None = None,
state: str | None = None,
timestamp: str | None = None,
extension: str = "png",
) -> str:
"""Generate semantic screenshot filename.
Format: {appName}_{screenName}_{state}_{timestamp}.{ext}
Falls back to: screenshot_{timestamp}.{ext}
Args:
app_name: Application name (e.g., 'MyApp')
screen_name: Screen name (e.g., 'Login')
state: State description (e.g., 'Empty', 'Filled', 'Error')
timestamp: ISO timestamp (uses current time if None)
extension: File extension (default: 'png')
Returns:
Semantic filename ready for safe file creation
Example:
name = generate_screenshot_name('MyApp', 'Login', 'Empty')
# Returns: 'MyApp_Login_Empty_20251028-143052.png'
name = generate_screenshot_name()
# Returns: 'screenshot_20251028-143052.png'
"""
if timestamp is None:
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
# Build semantic name
if app_name or screen_name or state:
parts = [app_name, screen_name, state]
parts = [p for p in parts if p] # Filter None/empty
name = "_".join(parts) + f"_{timestamp}"
else:
name = f"screenshot_{timestamp}"
return f"{name}.{extension}"
def get_size_preset(size: str = "half") -> tuple[float, float]:
"""Get scale factors for size preset.
Args:
size: 'full', 'half', 'quarter', 'thumb'
Returns:
Tuple of (scale_x, scale_y) for resizing
Example:
scale_x, scale_y = get_size_preset('half')
# Returns: (0.5, 0.5)
"""
presets = {
"full": (1.0, 1.0),
"half": (0.5, 0.5),
"quarter": (0.25, 0.25),
"thumb": (0.1, 0.1),
}
return presets.get(size, (0.5, 0.5))
def resize_screenshot(
input_path: str,
output_path: str | None = None,
size: str = "half",
quality: int = 85,
) -> tuple[str, int, int]:
"""Resize screenshot for token optimization.
Requires PIL (Pillow). Falls back gracefully without it.
Args:
input_path: Path to original screenshot
output_path: Output path (uses input_path if None)
size: 'full', 'half', 'quarter', 'thumb'
quality: JPEG quality (1-100, default: 85)
Returns:
Tuple of (output_path, width, height) of resized image
Raises:
FileNotFoundError: If input file doesn't exist
ValueError: If PIL not installed and size != 'full'
Example:
output, w, h = resize_screenshot(
'screenshot.png',
'screenshot_half.png',
'half'
)
print(f"Resized to {w}x{h}")
"""
input_file = Path(input_path)
if not input_file.exists():
raise FileNotFoundError(f"Screenshot not found: {input_path}")
# If full size, just copy
if size == "full":
if output_path:
import shutil
shutil.copy(input_path, output_path)
output_file = Path(output_path)
else:
output_file = input_file
# Get original dimensions
if HAS_PIL:
img = Image.open(str(output_file))
return (str(output_file), img.width, img.height)
return (str(output_file), 0, 0) # Dimensions unknown without PIL
# Need PIL to resize
if not HAS_PIL:
raise ValueError(
f"Size preset '{size}' requires PIL (Pillow). " "Install with: pip3 install pillow"
)
# Open original image
img = Image.open(str(input_file))
orig_w, orig_h = img.size
# Calculate new size
scale_x, scale_y = get_size_preset(size)
new_w = int(orig_w * scale_x)
new_h = int(orig_h * scale_y)
# Resize with high-quality resampling
resized = img.resize((new_w, new_h), Image.Resampling.LANCZOS)
# Determine output path
if output_path is None:
# Insert size marker before extension
stem = input_file.stem
suffix = input_file.suffix
output_path = str(input_file.parent / f"{stem}_{size}{suffix}")
# Save resized image
resized.save(output_path, quality=quality, optimize=True)
return (output_path, new_w, new_h)
def capture_screenshot(
udid: str,
output_path: str | None = None,
size: str = "half",
inline: bool = False,
app_name: str | None = None,
screen_name: str | None = None,
state: str | None = None,
) -> dict[str, Any]:
"""Capture screenshot with flexible output modes.
Supports both file-based (persistent artifacts) and inline base64 modes
(for vision-based automation).
Args:
udid: Device UDID
output_path: File path for file mode (generates semantic name if None)
size: 'full', 'half', 'quarter', 'thumb' (default: 'half')
inline: If True, returns base64 data instead of saving to file
app_name: App name for semantic naming
screen_name: Screen name for semantic naming
state: State description for semantic naming
Returns:
Dict with mode-specific fields:
File mode:
{
'mode': 'file',
'file_path': str,
'size_bytes': int,
'width': int,
'height': int,
'size_preset': str
}
Inline mode:
{
'mode': 'inline',
'base64_data': str,
'mime_type': 'image/png',
'width': int,
'height': int,
'size_preset': str
}
Example:
# File mode
result = capture_screenshot('ABC123', app_name='MyApp')
print(f"Saved to: {result['file_path']}")
# Inline mode
result = capture_screenshot('ABC123', inline=True, size='half')
print(f"Screenshot: {result['width']}x{result['height']}")
print(f"Base64: {result['base64_data'][:50]}...")
"""
try:
# Capture raw screenshot to temp file
temp_path = "/tmp/ios_simulator_screenshot.png"
cmd = ["xcrun", "simctl", "io", udid, "screenshot", temp_path]
subprocess.run(cmd, capture_output=True, text=True, check=True)
if inline:
# Inline mode: resize and convert to base64
# Resize if needed
if size != "full" and HAS_PIL:
resized_path, width, height = resize_screenshot(temp_path, size=size)
else:
resized_path = temp_path
# Get dimensions via PIL if available
if HAS_PIL:
img = Image.open(resized_path)
width, height = img.size
else:
width, height = 390, 844 # Fallback to common device size
# Read and encode as base64
with open(resized_path, "rb") as f:
base64_data = base64.b64encode(f.read()).decode("utf-8")
# Clean up temp files
Path(temp_path).unlink(missing_ok=True)
if resized_path != temp_path:
Path(resized_path).unlink(missing_ok=True)
return {
"mode": "inline",
"base64_data": base64_data,
"mime_type": "image/png",
"width": width,
"height": height,
"size_preset": size,
}
# File mode: save to output path with semantic naming
if output_path is None:
output_path = generate_screenshot_name(app_name, screen_name, state)
# Resize if needed
if size != "full" and HAS_PIL:
final_path, width, height = resize_screenshot(temp_path, output_path, size)
else:
# Just move temp to output
import shutil
shutil.move(temp_path, output_path)
final_path = output_path
# Get dimensions via PIL if available
if HAS_PIL:
img = Image.open(final_path)
width, height = img.size
else:
width, height = 390, 844 # Fallback
# Get file size
size_bytes = Path(final_path).stat().st_size
return {
"mode": "file",
"file_path": final_path,
"size_bytes": size_bytes,
"width": width,
"height": height,
"size_preset": size,
}
except subprocess.CalledProcessError as e:
raise RuntimeError(f"Failed to capture screenshot: {e.stderr}") from e
except Exception as e:
raise RuntimeError(f"Screenshot capture error: {e!s}") from e
def format_screenshot_result(result: dict[str, Any]) -> str:
"""Format screenshot result for human-readable output.
Args:
result: Result dictionary from capture_screenshot()
Returns:
Formatted string for printing
Example:
result = capture_screenshot('ABC123', inline=True)
print(format_screenshot_result(result))
"""
if result["mode"] == "file":
return (
f"Screenshot: {result['file_path']}\n"
f"Dimensions: {result['width']}x{result['height']}\n"
f"Size: {result['size_bytes']} bytes"
)
return (
f"Screenshot (inline): {result['width']}x{result['height']}\n"
f"Base64 length: {len(result['base64_data'])} chars"
)

View File

@@ -0,0 +1,394 @@
#!/usr/bin/env python3
"""
iOS Gesture Controller - Swipes and Complex Gestures
Performs navigation gestures like swipes, scrolls, and pinches.
Token-efficient output for common navigation patterns.
This script handles touch gestures for iOS simulator automation. It provides
directional swipes, multi-swipe scrolling, pull-to-refresh, and pinch gestures.
Automatically detects screen size from the device for accurate gesture positioning.
Key Features:
- Directional swipes (up, down, left, right)
- Multi-swipe scrolling with customizable amount
- Pull-to-refresh gesture
- Pinch to zoom (in/out)
- Custom swipe between any two points
- Drag and drop simulation
- Auto-detects screen dimensions from device
Usage Examples:
# Simple directional swipe
python scripts/gesture.py --swipe up --udid <device-id>
# Scroll down multiple times
python scripts/gesture.py --scroll down --scroll-amount 3 --udid <device-id>
# Pull to refresh
python scripts/gesture.py --refresh --udid <device-id>
# Custom swipe coordinates
python scripts/gesture.py --swipe-from 100,500 --swipe-to 100,100 --udid <device-id>
# Pinch to zoom
python scripts/gesture.py --pinch out --udid <device-id>
# Long press at coordinates
python scripts/gesture.py --long-press 200,300 --duration 2.0 --udid <device-id>
Output Format:
Swiped up
Scrolled down (3x)
Performed pull to refresh
Gesture Details:
- Swipes use 70% of screen by default (configurable)
- Scrolls are multiple small 30% swipes with delays
- Start points are offset from edges for reliability
- Screen size auto-detected from accessibility tree root element
- Falls back to iPhone 14 dimensions (390x844) if detection fails
Technical Details:
- Uses `idb ui swipe x1 y1 x2 y2` for gesture execution
- Duration parameter converts to milliseconds for IDB
- Automatically fetches screen size on initialization
- Parses IDB accessibility tree to get root frame dimensions
- All coordinates calculated as fractions of screen size for device independence
"""
import argparse
import subprocess
import sys
import time
from common import (
get_device_screen_size,
get_screen_size,
resolve_udid,
transform_screenshot_coords,
)
class GestureController:
"""Performs gestures on iOS simulator."""
# Standard screen dimensions (will be detected if possible)
DEFAULT_WIDTH = 390 # iPhone 14
DEFAULT_HEIGHT = 844
def __init__(self, udid: str | None = None):
"""Initialize gesture controller."""
self.udid = udid
self.screen_size = self._get_screen_size()
def _get_screen_size(self) -> tuple[int, int]:
"""Try to detect screen size from device using shared utility."""
return get_screen_size(self.udid)
def swipe(self, direction: str, distance_ratio: float = 0.7) -> bool:
"""
Perform directional swipe.
Args:
direction: up, down, left, right
distance_ratio: How far to swipe (0.0-1.0 of screen)
Returns:
Success status
"""
width, height = self.screen_size
center_x = width // 2
center_y = height // 2
# Calculate swipe coordinates based on direction
if direction == "up":
start = (center_x, int(height * 0.7))
end = (center_x, int(height * (1 - distance_ratio + 0.3)))
elif direction == "down":
start = (center_x, int(height * 0.3))
end = (center_x, int(height * (distance_ratio - 0.3 + 0.3)))
elif direction == "left":
start = (int(width * 0.8), center_y)
end = (int(width * (1 - distance_ratio + 0.2)), center_y)
elif direction == "right":
start = (int(width * 0.2), center_y)
end = (int(width * (distance_ratio - 0.2 + 0.2)), center_y)
else:
return False
return self.swipe_between(start, end)
def swipe_between(
self, start: tuple[int, int], end: tuple[int, int], duration: float = 0.3
) -> bool:
"""
Swipe between two points.
Args:
start: Starting coordinates (x, y)
end: Ending coordinates (x, y)
duration: Swipe duration in seconds
Returns:
Success status
"""
cmd = ["idb", "ui", "swipe"]
cmd.extend([str(start[0]), str(start[1]), str(end[0]), str(end[1])])
# IDB doesn't support duration directly, but we can add delay
if duration != 0.3:
cmd.extend(["--duration", str(int(duration * 1000))])
if self.udid:
cmd.extend(["--udid", self.udid])
try:
subprocess.run(cmd, capture_output=True, check=True)
return True
except subprocess.CalledProcessError:
return False
def scroll(self, direction: str, amount: int = 3) -> bool:
"""
Perform multiple small swipes to scroll.
Args:
direction: up, down
amount: Number of small swipes
Returns:
Success status
"""
for _ in range(amount):
if not self.swipe(direction, distance_ratio=0.3):
return False
time.sleep(0.2) # Small delay between swipes
return True
def tap_and_hold(self, x: int, y: int, duration: float = 2.0) -> bool:
"""
Long press at coordinates.
Args:
x, y: Coordinates
duration: Hold duration in seconds
Returns:
Success status
"""
# IDB doesn't have native long press, simulate with tap
# In real implementation, might need to use different approach
cmd = ["idb", "ui", "tap", str(x), str(y)]
if self.udid:
cmd.extend(["--udid", self.udid])
try:
subprocess.run(cmd, capture_output=True, check=True)
# Simulate hold with delay
time.sleep(duration)
return True
except subprocess.CalledProcessError:
return False
def pinch(self, direction: str = "out", center: tuple[int, int] | None = None) -> bool:
"""
Perform pinch gesture (zoom in/out).
Args:
direction: 'in' (zoom out) or 'out' (zoom in)
center: Center point for pinch
Returns:
Success status
"""
if not center:
width, height = self.screen_size
center = (width // 2, height // 2)
# Calculate pinch points
offset = 100 if direction == "out" else 50
if direction == "out":
# Zoom in - fingers move apart
start1 = (center[0] - 20, center[1] - 20)
end1 = (center[0] - offset, center[1] - offset)
start2 = (center[0] + 20, center[1] + 20)
end2 = (center[0] + offset, center[1] + offset)
else:
# Zoom out - fingers move together
start1 = (center[0] - offset, center[1] - offset)
end1 = (center[0] - 20, center[1] - 20)
start2 = (center[0] + offset, center[1] + offset)
end2 = (center[0] + 20, center[1] + 20)
# Perform two swipes simultaneously (simulated)
success1 = self.swipe_between(start1, end1)
success2 = self.swipe_between(start2, end2)
return success1 and success2
def drag_and_drop(self, start: tuple[int, int], end: tuple[int, int]) -> bool:
"""
Drag element from one position to another.
Args:
start: Starting coordinates
end: Ending coordinates
Returns:
Success status
"""
# Use slow swipe to simulate drag
return self.swipe_between(start, end, duration=1.0)
def refresh(self) -> bool:
"""Pull to refresh gesture."""
width, _ = self.screen_size
start = (width // 2, 100)
end = (width // 2, 400)
return self.swipe_between(start, end)
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description="Perform gestures on iOS simulator")
# Gesture options
parser.add_argument(
"--swipe", choices=["up", "down", "left", "right"], help="Perform directional swipe"
)
parser.add_argument("--swipe-from", help="Custom swipe start coordinates (x,y)")
parser.add_argument("--swipe-to", help="Custom swipe end coordinates (x,y)")
parser.add_argument(
"--scroll", choices=["up", "down"], help="Scroll in direction (multiple small swipes)"
)
parser.add_argument(
"--scroll-amount", type=int, default=3, help="Number of scroll swipes (default: 3)"
)
parser.add_argument("--long-press", help="Long press at coordinates (x,y)")
parser.add_argument(
"--duration", type=float, default=2.0, help="Duration for long press in seconds"
)
parser.add_argument(
"--pinch", choices=["in", "out"], help="Pinch gesture (in=zoom out, out=zoom in)"
)
parser.add_argument("--refresh", action="store_true", help="Pull to refresh gesture")
# Coordinate transformation
parser.add_argument(
"--screenshot-coords",
action="store_true",
help="Interpret swipe coordinates as from a screenshot (requires --screenshot-width/height)",
)
parser.add_argument(
"--screenshot-width",
type=int,
help="Screenshot width for coordinate transformation",
)
parser.add_argument(
"--screenshot-height",
type=int,
help="Screenshot height for coordinate transformation",
)
parser.add_argument(
"--udid",
help="Device UDID (auto-detects booted simulator if not provided)",
)
args = parser.parse_args()
# Resolve UDID with auto-detection
try:
udid = resolve_udid(args.udid)
except RuntimeError as e:
print(f"Error: {e}")
sys.exit(1)
controller = GestureController(udid=udid)
# Execute requested gesture
if args.swipe:
if controller.swipe(args.swipe):
print(f"Swiped {args.swipe}")
else:
print(f"Failed to swipe {args.swipe}")
sys.exit(1)
elif args.swipe_from and args.swipe_to:
# Custom swipe
start = tuple(map(int, args.swipe_from.split(",")))
end = tuple(map(int, args.swipe_to.split(",")))
# Handle coordinate transformation if requested
if args.screenshot_coords:
if not args.screenshot_width or not args.screenshot_height:
print(
"Error: --screenshot-coords requires --screenshot-width and --screenshot-height"
)
sys.exit(1)
device_w, device_h = get_device_screen_size(udid)
start = transform_screenshot_coords(
start[0],
start[1],
args.screenshot_width,
args.screenshot_height,
device_w,
device_h,
)
end = transform_screenshot_coords(
end[0],
end[1],
args.screenshot_width,
args.screenshot_height,
device_w,
device_h,
)
print("Transformed screenshot coords to device coords")
if controller.swipe_between(start, end):
print(f"Swiped from {start} to {end}")
else:
print("Failed to swipe")
sys.exit(1)
elif args.scroll:
if controller.scroll(args.scroll, args.scroll_amount):
print(f"Scrolled {args.scroll} ({args.scroll_amount}x)")
else:
print(f"Failed to scroll {args.scroll}")
sys.exit(1)
elif args.long_press:
coords = tuple(map(int, args.long_press.split(",")))
if controller.tap_and_hold(coords[0], coords[1], args.duration):
print(f"Long pressed at {coords} for {args.duration}s")
else:
print("Failed to long press")
sys.exit(1)
elif args.pinch:
if controller.pinch(args.pinch):
action = "Zoomed in" if args.pinch == "out" else "Zoomed out"
print(action)
else:
print(f"Failed to pinch {args.pinch}")
sys.exit(1)
elif args.refresh:
if controller.refresh():
print("Performed pull to refresh")
else:
print("Failed to refresh")
sys.exit(1)
else:
parser.print_help()
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,391 @@
#!/usr/bin/env python3
"""
iOS Keyboard Controller - Text Entry and Hardware Buttons
Handles keyboard input, special keys, and hardware button simulation.
Token-efficient text entry and navigation control.
This script provides text input and hardware button control for iOS simulator
automation. It handles both typing text strings and pressing special keys like
return, delete, tab, etc. Also controls hardware buttons like home and lock.
Key Features:
- Type text strings into focused elements
- Press special keys (return, delete, tab, space, arrows)
- Hardware button simulation (home, lock, volume, screenshot)
- Character-by-character typing with delays (for animations)
- Multiple key press support
- iOS HID key code mapping for reliability
Usage Examples:
# Type text into focused field
python scripts/keyboard.py --type "hello@example.com" --udid <device-id>
# Press return key to submit
python scripts/keyboard.py --key return --udid <device-id>
# Press delete 3 times
python scripts/keyboard.py --key delete --key delete --key delete --udid <device-id>
# Press home button
python scripts/keyboard.py --button home --udid <device-id>
# Press lock button
python scripts/keyboard.py --button lock --udid <device-id>
# Type with delay between characters (for animations)
python scripts/keyboard.py --type "slow typing" --delay 0.1 --udid <device-id>
Output Format:
Typed: "hello@example.com"
Pressed return
Pressed home button
Special Keys Supported:
- return/enter: Submit forms, new lines (HID code 40)
- delete/backspace: Remove characters (HID code 42)
- tab: Navigate between fields (HID code 43)
- space: Space character (HID code 44)
- escape: Cancel/dismiss (HID code 41)
- up/down/left/right: Arrow keys (HID codes 82/81/80/79)
Hardware Buttons Supported:
- home: Return to home screen
- lock/power: Lock device
- volume-up/volume-down: Volume control
- ringer: Toggle mute
- screenshot: Capture screen
Technical Details:
- Uses `idb ui text` for typing text strings
- Uses `idb ui key <code>` for special keys with iOS HID codes
- HID codes from Apple's UIKeyboardHIDUsage specification
- Hardware buttons use `xcrun simctl` button actions
- Text entry works on currently focused element
- Special keys are integers (40=Return, 42=Delete, etc.)
"""
import argparse
import subprocess
import sys
import time
from common import resolve_udid
class KeyboardController:
"""Controls keyboard and hardware buttons on iOS simulator."""
# Special key mappings to iOS HID key codes
# See: https://developer.apple.com/documentation/uikit/uikeyboardhidusage
SPECIAL_KEYS = {
"return": 40,
"enter": 40,
"delete": 42,
"backspace": 42,
"tab": 43,
"space": 44,
"escape": 41,
"up": 82,
"down": 81,
"left": 80,
"right": 79,
}
# Hardware button mappings
HARDWARE_BUTTONS = {
"home": "HOME",
"lock": "LOCK",
"volume-up": "VOLUME_UP",
"volume-down": "VOLUME_DOWN",
"ringer": "RINGER",
"power": "LOCK", # Alias
"screenshot": "SCREENSHOT",
}
def __init__(self, udid: str | None = None):
"""Initialize keyboard controller."""
self.udid = udid
def type_text(self, text: str, delay: float = 0.0) -> bool:
"""
Type text into current focus.
Args:
text: Text to type
delay: Delay between characters (for slow typing effect)
Returns:
Success status
"""
if delay > 0:
# Type character by character with delay
for char in text:
if not self._type_single(char):
return False
time.sleep(delay)
return True
# Type all at once (efficient)
return self._type_single(text)
def _type_single(self, text: str) -> bool:
"""Type text using IDB."""
cmd = ["idb", "ui", "text", text]
if self.udid:
cmd.extend(["--udid", self.udid])
try:
subprocess.run(cmd, capture_output=True, check=True)
return True
except subprocess.CalledProcessError:
return False
def press_key(self, key: str, count: int = 1) -> bool:
"""
Press a special key.
Args:
key: Key name (return, delete, tab, etc.)
count: Number of times to press
Returns:
Success status
"""
# Map key name to IDB key code
key_code = self.SPECIAL_KEYS.get(key.lower())
if not key_code:
# Try as literal integer key code
try:
key_code = int(key)
except ValueError:
return False
cmd = ["idb", "ui", "key", str(key_code)]
if self.udid:
cmd.extend(["--udid", self.udid])
try:
for _ in range(count):
subprocess.run(cmd, capture_output=True, check=True)
if count > 1:
time.sleep(0.1) # Small delay for multiple presses
return True
except subprocess.CalledProcessError:
return False
def press_key_sequence(self, keys: list[str]) -> bool:
"""
Press a sequence of keys.
Args:
keys: List of key names
Returns:
Success status
"""
cmd_base = ["idb", "ui", "key-sequence"]
# Map keys to codes
mapped_keys = []
for key in keys:
mapped = self.SPECIAL_KEYS.get(key.lower())
if mapped is None:
# Try as integer
try:
mapped = int(key)
except ValueError:
return False
mapped_keys.append(str(mapped))
cmd = cmd_base + mapped_keys
if self.udid:
cmd.extend(["--udid", self.udid])
try:
subprocess.run(cmd, capture_output=True, check=True)
return True
except subprocess.CalledProcessError:
return False
def press_hardware_button(self, button: str) -> bool:
"""
Press hardware button.
Args:
button: Button name (home, lock, volume-up, etc.)
Returns:
Success status
"""
button_code = self.HARDWARE_BUTTONS.get(button.lower())
if not button_code:
return False
cmd = ["idb", "ui", "button", button_code]
if self.udid:
cmd.extend(["--udid", self.udid])
try:
subprocess.run(cmd, capture_output=True, check=True)
return True
except subprocess.CalledProcessError:
return False
def clear_text(self, select_all: bool = True) -> bool:
"""
Clear text in current field.
Args:
select_all: Use Cmd+A to select all first
Returns:
Success status
"""
if select_all:
# Select all then delete
# Note: This might need adjustment for iOS keyboard shortcuts
success = self.press_key_combo(["cmd", "a"])
if success:
return self.press_key("delete")
else:
# Just delete multiple times
return self.press_key("delete", count=50)
return None
def press_key_combo(self, keys: list[str]) -> bool:
"""
Press key combination (like Cmd+A).
Args:
keys: List of keys to press together
Returns:
Success status
"""
# IDB doesn't directly support key combos
# This is a workaround - may need platform-specific handling
if "cmd" in keys or "command" in keys:
# Handle common shortcuts
if "a" in keys:
# Select all - might work with key sequence
return self.press_key_sequence(["command", "a"])
if "c" in keys:
return self.press_key_sequence(["command", "c"])
if "v" in keys:
return self.press_key_sequence(["command", "v"])
if "x" in keys:
return self.press_key_sequence(["command", "x"])
# Try as sequence
return self.press_key_sequence(keys)
def dismiss_keyboard(self) -> bool:
"""Dismiss on-screen keyboard."""
# Common ways to dismiss keyboard on iOS
# Try Done button first, then Return
success = self.press_key("return")
if not success:
# Try tapping outside (would need coordinate)
pass
return success
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description="Control keyboard and hardware buttons")
# Text input
parser.add_argument("--type", help="Type text into current focus")
parser.add_argument("--slow", action="store_true", help="Type slowly (character by character)")
# Special keys
parser.add_argument("--key", help="Press special key (return, delete, tab, space, etc.)")
parser.add_argument("--key-sequence", help="Press key sequence (comma-separated)")
parser.add_argument("--count", type=int, default=1, help="Number of times to press key")
# Hardware buttons
parser.add_argument(
"--button",
choices=["home", "lock", "volume-up", "volume-down", "ringer", "screenshot"],
help="Press hardware button",
)
# Other operations
parser.add_argument("--clear", action="store_true", help="Clear current text field")
parser.add_argument("--dismiss", action="store_true", help="Dismiss keyboard")
parser.add_argument(
"--udid",
help="Device UDID (auto-detects booted simulator if not provided)",
)
args = parser.parse_args()
# Resolve UDID with auto-detection
try:
udid = resolve_udid(args.udid)
except RuntimeError as e:
print(f"Error: {e}")
sys.exit(1)
controller = KeyboardController(udid=udid)
# Execute requested action
if args.type:
delay = 0.1 if args.slow else 0.0
if controller.type_text(args.type, delay):
if args.slow:
print(f'Typed: "{args.type}" (slowly)')
else:
print(f'Typed: "{args.type}"')
else:
print("Failed to type text")
sys.exit(1)
elif args.key:
if controller.press_key(args.key, args.count):
if args.count > 1:
print(f"Pressed {args.key} ({args.count}x)")
else:
print(f"Pressed {args.key}")
else:
print(f"Failed to press {args.key}")
sys.exit(1)
elif args.key_sequence:
keys = args.key_sequence.split(",")
if controller.press_key_sequence(keys):
print(f"Pressed sequence: {' -> '.join(keys)}")
else:
print("Failed to press key sequence")
sys.exit(1)
elif args.button:
if controller.press_hardware_button(args.button):
print(f"Pressed {args.button} button")
else:
print(f"Failed to press {args.button}")
sys.exit(1)
elif args.clear:
if controller.clear_text():
print("Cleared text field")
else:
print("Failed to clear text")
sys.exit(1)
elif args.dismiss:
if controller.dismiss_keyboard():
print("Dismissed keyboard")
else:
print("Failed to dismiss keyboard")
sys.exit(1)
else:
parser.print_help()
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,486 @@
#!/usr/bin/env python3
"""
iOS Simulator Log Monitoring and Analysis
Real-time log streaming from iOS simulators with intelligent filtering, error detection,
and token-efficient summarization. Enhanced version of app_state_capture.py's log capture.
Features:
- Real-time log streaming from booted simulators
- Smart filtering by app bundle ID, subsystem, category, severity
- Error/warning classification and deduplication
- Duration-based or continuous follow mode
- Token-efficient summaries with full logs saved to file
- Integration with test_recorder and app_state_capture
Usage Examples:
# Monitor app logs in real-time (follow mode)
python scripts/log_monitor.py --app com.myapp.MyApp --follow
# Capture logs for specific duration
python scripts/log_monitor.py --app com.myapp.MyApp --duration 30s
# Extract errors and warnings only from last 5 minutes
python scripts/log_monitor.py --severity error,warning --last 5m
# Save logs to file
python scripts/log_monitor.py --app com.myapp.MyApp --duration 1m --output logs/
# Verbose output with full log lines
python scripts/log_monitor.py --app com.myapp.MyApp --verbose
"""
import argparse
import json
import re
import signal
import subprocess
import sys
from datetime import datetime, timedelta
from pathlib import Path
class LogMonitor:
"""Monitor and analyze iOS simulator logs with intelligent filtering."""
def __init__(
self,
app_bundle_id: str | None = None,
device_udid: str | None = None,
severity_filter: list[str] | None = None,
):
"""
Initialize log monitor.
Args:
app_bundle_id: Filter logs by app bundle ID
device_udid: Device UDID (uses booted if not specified)
severity_filter: List of severities to include (error, warning, info, debug)
"""
self.app_bundle_id = app_bundle_id
self.device_udid = device_udid or "booted"
self.severity_filter = severity_filter or ["error", "warning", "info", "debug"]
# Log storage
self.log_lines: list[str] = []
self.errors: list[str] = []
self.warnings: list[str] = []
self.info_messages: list[str] = []
# Statistics
self.error_count = 0
self.warning_count = 0
self.info_count = 0
self.debug_count = 0
self.total_lines = 0
# Deduplication
self.seen_messages: set[str] = set()
# Process control
self.log_process: subprocess.Popen | None = None
self.interrupted = False
def parse_time_duration(self, duration_str: str) -> float:
"""
Parse duration string to seconds.
Args:
duration_str: Duration like "30s", "5m", "1h"
Returns:
Duration in seconds
"""
match = re.match(r"(\d+)([smh])", duration_str.lower())
if not match:
raise ValueError(
f"Invalid duration format: {duration_str}. Use format like '30s', '5m', '1h'"
)
value, unit = match.groups()
value = int(value)
if unit == "s":
return value
if unit == "m":
return value * 60
if unit == "h":
return value * 3600
return 0
def classify_log_line(self, line: str) -> str | None:
"""
Classify log line by severity.
Args:
line: Log line to classify
Returns:
Severity level (error, warning, info, debug) or None
"""
line_lower = line.lower()
# Error patterns
error_patterns = [
r"\berror\b",
r"\bfault\b",
r"\bfailed\b",
r"\bexception\b",
r"\bcrash\b",
r"",
]
# Warning patterns
warning_patterns = [r"\bwarning\b", r"\bwarn\b", r"\bdeprecated\b", r"⚠️"]
# Info patterns
info_patterns = [r"\binfo\b", r"\bnotice\b", r""]
for pattern in error_patterns:
if re.search(pattern, line_lower):
return "error"
for pattern in warning_patterns:
if re.search(pattern, line_lower):
return "warning"
for pattern in info_patterns:
if re.search(pattern, line_lower):
return "info"
return "debug"
def deduplicate_message(self, line: str) -> bool:
"""
Check if message is duplicate.
Args:
line: Log line
Returns:
True if this is a new message, False if duplicate
"""
# Create signature by removing timestamps and process IDs
signature = re.sub(r"\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}", "", line)
signature = re.sub(r"\[\d+\]", "", signature)
signature = re.sub(r"\s+", " ", signature).strip()
if signature in self.seen_messages:
return False
self.seen_messages.add(signature)
return True
def process_log_line(self, line: str):
"""
Process a single log line.
Args:
line: Log line to process
"""
if not line.strip():
return
self.total_lines += 1
self.log_lines.append(line)
# Classify severity
severity = self.classify_log_line(line)
# Skip if not in filter
if severity not in self.severity_filter:
return
# Deduplicate (for errors and warnings)
if severity in ["error", "warning"] and not self.deduplicate_message(line):
return
# Store by severity
if severity == "error":
self.error_count += 1
self.errors.append(line)
elif severity == "warning":
self.warning_count += 1
self.warnings.append(line)
elif severity == "info":
self.info_count += 1
if len(self.info_messages) < 20: # Keep only recent info
self.info_messages.append(line)
else: # debug
self.debug_count += 1
def stream_logs(
self,
follow: bool = False,
duration: float | None = None,
last_minutes: float | None = None,
) -> bool:
"""
Stream logs from simulator.
Args:
follow: Follow mode (continuous streaming)
duration: Capture duration in seconds
last_minutes: Show logs from last N minutes
Returns:
True if successful
"""
# Build log stream command
cmd = ["xcrun", "simctl", "spawn", self.device_udid, "log", "stream"]
# Add filters
if self.app_bundle_id:
# Filter by process name (extracted from bundle ID)
app_name = self.app_bundle_id.split(".")[-1]
cmd.extend(["--predicate", f'processImagePath CONTAINS "{app_name}"'])
# Add time filter for historical logs
if last_minutes:
start_time = datetime.now() - timedelta(minutes=last_minutes)
time_str = start_time.strftime("%Y-%m-%d %H:%M:%S")
cmd.extend(["--start", time_str])
# Setup signal handler for graceful interruption
def signal_handler(sig, frame):
self.interrupted = True
if self.log_process:
self.log_process.terminate()
signal.signal(signal.SIGINT, signal_handler)
try:
# Start log streaming process
self.log_process = subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
bufsize=1, # Line buffered
)
# Track start time for duration
start_time = datetime.now()
# Process log lines
for line in iter(self.log_process.stdout.readline, ""):
if not line:
break
# Process the line
self.process_log_line(line.rstrip())
# Print in follow mode
if follow:
severity = self.classify_log_line(line)
if severity in self.severity_filter:
print(line.rstrip())
# Check duration
if duration and (datetime.now() - start_time).total_seconds() >= duration:
break
# Check if interrupted
if self.interrupted:
break
# Wait for process to finish
self.log_process.wait()
return True
except Exception as e:
print(f"Error streaming logs: {e}", file=sys.stderr)
return False
finally:
if self.log_process:
self.log_process.terminate()
def get_summary(self, verbose: bool = False) -> str:
"""
Get log summary.
Args:
verbose: Include full log details
Returns:
Formatted summary string
"""
lines = []
# Header
if self.app_bundle_id:
lines.append(f"Logs for: {self.app_bundle_id}")
else:
lines.append("Logs for: All processes")
# Statistics
lines.append(f"Total lines: {self.total_lines}")
lines.append(
f"Errors: {self.error_count}, Warnings: {self.warning_count}, Info: {self.info_count}"
)
# Top issues
if self.errors:
lines.append(f"\nTop Errors ({len(self.errors)}):")
for error in self.errors[:5]: # Show first 5
lines.append(f"{error[:120]}") # Truncate long lines
if self.warnings:
lines.append(f"\nTop Warnings ({len(self.warnings)}):")
for warning in self.warnings[:5]: # Show first 5
lines.append(f" ⚠️ {warning[:120]}")
# Verbose output
if verbose and self.log_lines:
lines.append("\n=== Recent Log Lines ===")
for line in self.log_lines[-50:]: # Last 50 lines
lines.append(line)
return "\n".join(lines)
def get_json_output(self) -> dict:
"""Get log results as JSON."""
return {
"app_bundle_id": self.app_bundle_id,
"device_udid": self.device_udid,
"statistics": {
"total_lines": self.total_lines,
"errors": self.error_count,
"warnings": self.warning_count,
"info": self.info_count,
"debug": self.debug_count,
},
"errors": self.errors[:20], # Limit to 20
"warnings": self.warnings[:20],
"sample_logs": self.log_lines[-50:], # Last 50 lines
}
def save_logs(self, output_dir: str) -> str:
"""
Save logs to file.
Args:
output_dir: Directory to save logs
Returns:
Path to saved log file
"""
# Create output directory
output_path = Path(output_dir)
output_path.mkdir(parents=True, exist_ok=True)
# Generate filename with timestamp
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
app_name = self.app_bundle_id.split(".")[-1] if self.app_bundle_id else "simulator"
log_file = output_path / f"{app_name}-{timestamp}.log"
# Write all log lines
with open(log_file, "w") as f:
f.write("\n".join(self.log_lines))
# Also save JSON summary
json_file = output_path / f"{app_name}-{timestamp}-summary.json"
with open(json_file, "w") as f:
json.dump(self.get_json_output(), f, indent=2)
return str(log_file)
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(
description="Monitor and analyze iOS simulator logs",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Monitor app in real-time
python scripts/log_monitor.py --app com.myapp.MyApp --follow
# Capture logs for 30 seconds
python scripts/log_monitor.py --app com.myapp.MyApp --duration 30s
# Show errors/warnings from last 5 minutes
python scripts/log_monitor.py --severity error,warning --last 5m
# Save logs to file
python scripts/log_monitor.py --app com.myapp.MyApp --duration 1m --output logs/
""",
)
# Filtering options
parser.add_argument(
"--app", dest="app_bundle_id", help="App bundle ID to filter logs (e.g., com.myapp.MyApp)"
)
parser.add_argument("--device-udid", help="Device UDID (uses booted if not specified)")
parser.add_argument(
"--severity", help="Comma-separated severity levels (error,warning,info,debug)"
)
# Time options
time_group = parser.add_mutually_exclusive_group()
time_group.add_argument(
"--follow", action="store_true", help="Follow mode (continuous streaming)"
)
time_group.add_argument("--duration", help="Capture duration (e.g., 30s, 5m, 1h)")
time_group.add_argument(
"--last", dest="last_minutes", help="Show logs from last N minutes (e.g., 5m)"
)
# Output options
parser.add_argument("--output", help="Save logs to directory")
parser.add_argument("--verbose", action="store_true", help="Show detailed output")
parser.add_argument("--json", action="store_true", help="Output as JSON")
args = parser.parse_args()
# Parse severity filter
severity_filter = None
if args.severity:
severity_filter = [s.strip().lower() for s in args.severity.split(",")]
# Initialize monitor
monitor = LogMonitor(
app_bundle_id=args.app_bundle_id,
device_udid=args.device_udid,
severity_filter=severity_filter,
)
# Parse duration
duration = None
if args.duration:
duration = monitor.parse_time_duration(args.duration)
# Parse last minutes
last_minutes = None
if args.last_minutes:
last_minutes = monitor.parse_time_duration(args.last_minutes) / 60
# Stream logs
print("Monitoring logs...", file=sys.stderr)
if args.app_bundle_id:
print(f"App: {args.app_bundle_id}", file=sys.stderr)
success = monitor.stream_logs(follow=args.follow, duration=duration, last_minutes=last_minutes)
if not success:
sys.exit(1)
# Save logs if requested
if args.output:
log_file = monitor.save_logs(args.output)
print(f"\nLogs saved to: {log_file}", file=sys.stderr)
# Output results
if not args.follow: # Don't show summary in follow mode
if args.json:
print(json.dumps(monitor.get_json_output(), indent=2))
else:
print("\n" + monitor.get_summary(verbose=args.verbose))
sys.exit(0)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,453 @@
#!/usr/bin/env python3
"""
iOS Simulator Navigator - Smart Element Finder and Interactor
Finds and interacts with UI elements using accessibility data.
Prioritizes structured navigation over pixel-based interaction.
This script is the core automation tool for iOS simulator navigation. It finds
UI elements by text, type, or accessibility ID and performs actions on them
(tap, enter text). Uses semantic element finding instead of fragile pixel coordinates.
Key Features:
- Find elements by text (fuzzy or exact matching)
- Find elements by type (Button, TextField, etc.)
- Find elements by accessibility identifier
- Tap elements at their center point
- Enter text into text fields
- List all tappable elements on screen
- Automatic element caching for performance
Usage Examples:
# Find and tap a button by text
python scripts/navigator.py --find-text "Login" --tap --udid <device-id>
# Enter text into first text field
python scripts/navigator.py --find-type TextField --index 0 --enter-text "username" --udid <device-id>
# Tap element by accessibility ID
python scripts/navigator.py --find-id "submitButton" --tap --udid <device-id>
# List all interactive elements
python scripts/navigator.py --list --udid <device-id>
# Tap at specific coordinates (fallback)
python scripts/navigator.py --tap-at 200,400 --udid <device-id>
Output Format:
Tapped: Button "Login" at (320, 450)
Entered text in: TextField "Username"
Not found: text='Submit'
Navigation Priority (best to worst):
1. Find by accessibility label/text (most reliable)
2. Find by element type + index (good for forms)
3. Find by accessibility ID (precise but app-specific)
4. Tap at coordinates (last resort, fragile)
Technical Details:
- Uses IDB's accessibility tree via `idb ui describe-all --json --nested`
- Caches tree for multiple operations (call with force_refresh to update)
- Finds elements by parsing tree recursively
- Calculates tap coordinates from element frame center
- Uses `idb ui tap` for tapping, `idb ui text` for text entry
- Extracts data from AXLabel, AXValue, and AXUniqueId fields
"""
import argparse
import json
import subprocess
import sys
from dataclasses import dataclass
from common import (
flatten_tree,
get_accessibility_tree,
get_device_screen_size,
resolve_udid,
transform_screenshot_coords,
)
@dataclass
class Element:
"""Represents a UI element from accessibility tree."""
type: str
label: str | None
value: str | None
identifier: str | None
frame: dict[str, float]
traits: list[str]
enabled: bool = True
@property
def center(self) -> tuple[int, int]:
"""Calculate center point for tapping."""
x = int(self.frame["x"] + self.frame["width"] / 2)
y = int(self.frame["y"] + self.frame["height"] / 2)
return (x, y)
@property
def description(self) -> str:
"""Human-readable description."""
label = self.label or self.value or self.identifier or "Unnamed"
return f'{self.type} "{label}"'
class Navigator:
"""Navigates iOS apps using accessibility data."""
def __init__(self, udid: str | None = None):
"""Initialize navigator with optional device UDID."""
self.udid = udid
self._tree_cache = None
def get_accessibility_tree(self, force_refresh: bool = False) -> dict:
"""Get accessibility tree (cached for efficiency)."""
if self._tree_cache and not force_refresh:
return self._tree_cache
# Delegate to shared utility
self._tree_cache = get_accessibility_tree(self.udid, nested=True)
return self._tree_cache
def _flatten_tree(self, node: dict, elements: list[Element] | None = None) -> list[Element]:
"""Flatten accessibility tree into list of elements."""
if elements is None:
elements = []
# Create element from node
if node.get("type"):
element = Element(
type=node.get("type", "Unknown"),
label=node.get("AXLabel"),
value=node.get("AXValue"),
identifier=node.get("AXUniqueId"),
frame=node.get("frame", {}),
traits=node.get("traits", []),
enabled=node.get("enabled", True),
)
elements.append(element)
# Process children
for child in node.get("children", []):
self._flatten_tree(child, elements)
return elements
def find_element(
self,
text: str | None = None,
element_type: str | None = None,
identifier: str | None = None,
index: int = 0,
fuzzy: bool = True,
) -> Element | None:
"""
Find element by various criteria.
Args:
text: Text to search in label/value
element_type: Type of element (Button, TextField, etc.)
identifier: Accessibility identifier
index: Which matching element to return (0-based)
fuzzy: Use fuzzy matching for text
Returns:
Element if found, None otherwise
"""
tree = self.get_accessibility_tree()
elements = self._flatten_tree(tree)
matches = []
for elem in elements:
# Skip disabled elements
if not elem.enabled:
continue
# Check type
if element_type and elem.type != element_type:
continue
# Check identifier (exact match)
if identifier and elem.identifier != identifier:
continue
# Check text (in label or value)
if text:
elem_text = (elem.label or "") + " " + (elem.value or "")
if fuzzy:
if text.lower() not in elem_text.lower():
continue
elif text not in (elem.label, elem.value):
continue
matches.append(elem)
if matches and index < len(matches):
return matches[index]
return None
def tap(self, element: Element) -> bool:
"""Tap on an element."""
x, y = element.center
return self.tap_at(x, y)
def tap_at(self, x: int, y: int) -> bool:
"""Tap at specific coordinates."""
cmd = ["idb", "ui", "tap", str(x), str(y)]
if self.udid:
cmd.extend(["--udid", self.udid])
try:
subprocess.run(cmd, capture_output=True, check=True)
return True
except subprocess.CalledProcessError:
return False
def enter_text(self, text: str, element: Element | None = None) -> bool:
"""
Enter text into element or current focus.
Args:
text: Text to enter
element: Optional element to tap first
Returns:
Success status
"""
# Tap element if provided
if element:
if not self.tap(element):
return False
# Small delay for focus
import time
time.sleep(0.5)
# Enter text
cmd = ["idb", "ui", "text", text]
if self.udid:
cmd.extend(["--udid", self.udid])
try:
subprocess.run(cmd, capture_output=True, check=True)
return True
except subprocess.CalledProcessError:
return False
def find_and_tap(
self,
text: str | None = None,
element_type: str | None = None,
identifier: str | None = None,
index: int = 0,
) -> tuple[bool, str]:
"""
Find element and tap it.
Returns:
(success, message) tuple
"""
element = self.find_element(text, element_type, identifier, index)
if not element:
criteria = []
if text:
criteria.append(f"text='{text}'")
if element_type:
criteria.append(f"type={element_type}")
if identifier:
criteria.append(f"id={identifier}")
return (False, f"Not found: {', '.join(criteria)}")
if self.tap(element):
return (True, f"Tapped: {element.description} at {element.center}")
return (False, f"Failed to tap: {element.description}")
def find_and_enter_text(
self,
text_to_enter: str,
find_text: str | None = None,
element_type: str | None = "TextField",
identifier: str | None = None,
index: int = 0,
) -> tuple[bool, str]:
"""
Find element and enter text into it.
Returns:
(success, message) tuple
"""
element = self.find_element(find_text, element_type, identifier, index)
if not element:
return (False, "TextField not found")
if self.enter_text(text_to_enter, element):
return (True, f"Entered text in: {element.description}")
return (False, "Failed to enter text")
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description="Navigate iOS apps using accessibility data")
# Finding options
parser.add_argument("--find-text", help="Find element by text (fuzzy match)")
parser.add_argument("--find-exact", help="Find element by exact text")
parser.add_argument("--find-type", help="Element type (Button, TextField, etc.)")
parser.add_argument("--find-id", help="Accessibility identifier")
parser.add_argument("--index", type=int, default=0, help="Which match to use (0-based)")
# Action options
parser.add_argument("--tap", action="store_true", help="Tap the found element")
parser.add_argument("--tap-at", help="Tap at coordinates (x,y)")
parser.add_argument("--enter-text", help="Enter text into element")
# Coordinate transformation
parser.add_argument(
"--screenshot-coords",
action="store_true",
help="Interpret tap coordinates as from a screenshot (requires --screenshot-width/height)",
)
parser.add_argument(
"--screenshot-width",
type=int,
help="Screenshot width for coordinate transformation",
)
parser.add_argument(
"--screenshot-height",
type=int,
help="Screenshot height for coordinate transformation",
)
# Other options
parser.add_argument(
"--udid",
help="Device UDID (auto-detects booted simulator if not provided)",
)
parser.add_argument("--list", action="store_true", help="List all tappable elements")
args = parser.parse_args()
# Resolve UDID with auto-detection
try:
udid = resolve_udid(args.udid)
except RuntimeError as e:
print(f"Error: {e}")
sys.exit(1)
navigator = Navigator(udid=udid)
# List mode
if args.list:
tree = navigator.get_accessibility_tree()
elements = navigator._flatten_tree(tree)
# Filter to tappable elements
tappable = [
e
for e in elements
if e.enabled and e.type in ["Button", "Link", "Cell", "TextField", "SecureTextField"]
]
print(f"Tappable elements ({len(tappable)}):")
for elem in tappable[:10]: # Limit output for tokens
print(f" {elem.type}: \"{elem.label or elem.value or 'Unnamed'}\" {elem.center}")
if len(tappable) > 10:
print(f" ... and {len(tappable) - 10} more")
sys.exit(0)
# Direct tap at coordinates
if args.tap_at:
coords = args.tap_at.split(",")
if len(coords) != 2:
print("Error: --tap-at requires x,y format")
sys.exit(1)
x, y = int(coords[0]), int(coords[1])
# Handle coordinate transformation if requested
if args.screenshot_coords:
if not args.screenshot_width or not args.screenshot_height:
print(
"Error: --screenshot-coords requires --screenshot-width and --screenshot-height"
)
sys.exit(1)
device_w, device_h = get_device_screen_size(udid)
x, y = transform_screenshot_coords(
x,
y,
args.screenshot_width,
args.screenshot_height,
device_w,
device_h,
)
print(
f"Transformed screenshot coords ({coords[0]}, {coords[1]}) "
f"to device coords ({x}, {y})"
)
if navigator.tap_at(x, y):
print(f"Tapped at ({x}, {y})")
else:
print(f"Failed to tap at ({x}, {y})")
sys.exit(1)
# Find and tap
elif args.tap:
text = args.find_text or args.find_exact
fuzzy = args.find_text is not None
success, message = navigator.find_and_tap(
text=text, element_type=args.find_type, identifier=args.find_id, index=args.index
)
print(message)
if not success:
sys.exit(1)
# Find and enter text
elif args.enter_text:
text = args.find_text or args.find_exact
success, message = navigator.find_and_enter_text(
text_to_enter=args.enter_text,
find_text=text,
element_type=args.find_type or "TextField",
identifier=args.find_id,
index=args.index,
)
print(message)
if not success:
sys.exit(1)
# Just find (no action)
else:
text = args.find_text or args.find_exact
fuzzy = args.find_text is not None
element = navigator.find_element(
text=text,
element_type=args.find_type,
identifier=args.find_id,
index=args.index,
fuzzy=fuzzy,
)
if element:
print(f"Found: {element.description} at {element.center}")
else:
print("Element not found")
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,310 @@
#!/usr/bin/env python3
"""
iOS Privacy & Permissions Manager
Grant/revoke app permissions for testing permission flows.
Supports 13+ services with audit trail tracking.
Usage: python scripts/privacy_manager.py --grant camera --bundle-id com.app
"""
import argparse
import subprocess
import sys
from datetime import datetime
from common import resolve_udid
class PrivacyManager:
"""Manages iOS app privacy and permissions."""
# Supported services
SUPPORTED_SERVICES = {
"camera": "Camera access",
"microphone": "Microphone access",
"location": "Location services",
"contacts": "Contacts access",
"photos": "Photos library access",
"calendar": "Calendar access",
"health": "Health data access",
"reminders": "Reminders access",
"motion": "Motion & fitness",
"keyboard": "Keyboard access",
"mediaLibrary": "Media library",
"calls": "Call history",
"siri": "Siri access",
}
def __init__(self, udid: str | None = None):
"""Initialize privacy manager.
Args:
udid: Optional device UDID (auto-detects booted simulator if None)
"""
self.udid = udid
def grant_permission(
self,
bundle_id: str,
service: str,
scenario: str | None = None,
step: int | None = None,
) -> bool:
"""
Grant permission for app.
Args:
bundle_id: App bundle ID
service: Service name (camera, microphone, location, etc.)
scenario: Test scenario name for audit trail
step: Step number in test scenario
Returns:
Success status
"""
if service not in self.SUPPORTED_SERVICES:
print(f"Error: Unknown service '{service}'")
print(f"Supported: {', '.join(self.SUPPORTED_SERVICES.keys())}")
return False
cmd = ["xcrun", "simctl", "privacy"]
if self.udid:
cmd.append(self.udid)
else:
cmd.append("booted")
cmd.extend(["grant", service, bundle_id])
try:
subprocess.run(cmd, capture_output=True, check=True)
# Log audit entry
self._log_audit("grant", bundle_id, service, scenario, step)
return True
except subprocess.CalledProcessError:
return False
def revoke_permission(
self,
bundle_id: str,
service: str,
scenario: str | None = None,
step: int | None = None,
) -> bool:
"""
Revoke permission for app.
Args:
bundle_id: App bundle ID
service: Service name
scenario: Test scenario name for audit trail
step: Step number in test scenario
Returns:
Success status
"""
if service not in self.SUPPORTED_SERVICES:
print(f"Error: Unknown service '{service}'")
return False
cmd = ["xcrun", "simctl", "privacy"]
if self.udid:
cmd.append(self.udid)
else:
cmd.append("booted")
cmd.extend(["revoke", service, bundle_id])
try:
subprocess.run(cmd, capture_output=True, check=True)
# Log audit entry
self._log_audit("revoke", bundle_id, service, scenario, step)
return True
except subprocess.CalledProcessError:
return False
def reset_permission(
self,
bundle_id: str,
service: str,
scenario: str | None = None,
step: int | None = None,
) -> bool:
"""
Reset permission to default.
Args:
bundle_id: App bundle ID
service: Service name
scenario: Test scenario name for audit trail
step: Step number in test scenario
Returns:
Success status
"""
if service not in self.SUPPORTED_SERVICES:
print(f"Error: Unknown service '{service}'")
return False
cmd = ["xcrun", "simctl", "privacy"]
if self.udid:
cmd.append(self.udid)
else:
cmd.append("booted")
cmd.extend(["reset", service, bundle_id])
try:
subprocess.run(cmd, capture_output=True, check=True)
# Log audit entry
self._log_audit("reset", bundle_id, service, scenario, step)
return True
except subprocess.CalledProcessError:
return False
@staticmethod
def _log_audit(
action: str,
bundle_id: str,
service: str,
scenario: str | None = None,
step: int | None = None,
) -> None:
"""Log permission change to audit trail (for test tracking).
Args:
action: grant, revoke, or reset
bundle_id: App bundle ID
service: Service name
scenario: Test scenario name
step: Step number
"""
# Could write to file, but for now just log to stdout for transparency
timestamp = datetime.now().isoformat()
location = f" (step {step})" if step else ""
scenario_info = f" in {scenario}" if scenario else ""
print(
f"[Audit] {timestamp}: {action.upper()} {service} for {bundle_id}{scenario_info}{location}"
)
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description="Manage iOS app privacy and permissions")
# Required
parser.add_argument("--bundle-id", required=True, help="App bundle ID (e.g., com.example.app)")
# Action (mutually exclusive)
action_group = parser.add_mutually_exclusive_group(required=True)
action_group.add_argument(
"--grant",
help="Grant permission (service name or comma-separated list)",
)
action_group.add_argument(
"--revoke", help="Revoke permission (service name or comma-separated list)"
)
action_group.add_argument(
"--reset",
help="Reset permission to default (service name or comma-separated list)",
)
action_group.add_argument(
"--list",
action="store_true",
help="List all supported services",
)
# Test tracking
parser.add_argument(
"--scenario",
help="Test scenario name for audit trail",
)
parser.add_argument("--step", type=int, help="Step number in test scenario")
# Device
parser.add_argument(
"--udid",
help="Device UDID (auto-detects booted simulator if not provided)",
)
args = parser.parse_args()
# List supported services
if args.list:
print("Supported Privacy Services:\n")
for service, description in PrivacyManager.SUPPORTED_SERVICES.items():
print(f" {service:<15} - {description}")
print()
print("Examples:")
print(" python scripts/privacy_manager.py --grant camera --bundle-id com.app")
print(" python scripts/privacy_manager.py --revoke location --bundle-id com.app")
print(" python scripts/privacy_manager.py --grant camera,photos --bundle-id com.app")
sys.exit(0)
# Resolve UDID with auto-detection
try:
udid = resolve_udid(args.udid)
except RuntimeError as e:
print(f"Error: {e}")
sys.exit(1)
manager = PrivacyManager(udid=udid)
# Parse service names (support comma-separated list)
if args.grant:
services = [s.strip() for s in args.grant.split(",")]
action = "grant"
action_fn = manager.grant_permission
elif args.revoke:
services = [s.strip() for s in args.revoke.split(",")]
action = "revoke"
action_fn = manager.revoke_permission
else: # reset
services = [s.strip() for s in args.reset.split(",")]
action = "reset"
action_fn = manager.reset_permission
# Execute action for each service
all_success = True
for service in services:
if service not in PrivacyManager.SUPPORTED_SERVICES:
print(f"Error: Unknown service '{service}'")
all_success = False
continue
success = action_fn(
args.bundle_id,
service,
scenario=args.scenario,
step=args.step,
)
if success:
description = PrivacyManager.SUPPORTED_SERVICES[service]
print(f"{action.capitalize()} {service}: {description}")
else:
print(f"✗ Failed to {action} {service}")
all_success = False
if not all_success:
sys.exit(1)
# Summary
if len(services) > 1:
print(f"\nPermissions {action}ed: {', '.join(services)}")
if args.scenario:
print(f"Test scenario: {args.scenario}" + (f" (step {args.step})" if args.step else ""))
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,240 @@
#!/usr/bin/env python3
"""
iOS Push Notification Simulator
Send simulated push notifications to test notification handling.
Supports custom payloads and test tracking.
Usage: python scripts/push_notification.py --bundle-id com.app --title "Alert" --body "Message"
"""
import argparse
import json
import subprocess
import sys
import tempfile
from pathlib import Path
from common import resolve_udid
class PushNotificationSender:
"""Sends simulated push notifications to iOS simulator."""
def __init__(self, udid: str | None = None):
"""Initialize push notification sender.
Args:
udid: Optional device UDID (auto-detects booted simulator if None)
"""
self.udid = udid
def send(
self,
bundle_id: str,
payload: dict | str,
_test_name: str | None = None,
_expected_behavior: str | None = None,
) -> bool:
"""
Send push notification to app.
Args:
bundle_id: Target app bundle ID
payload: Push payload (dict or JSON string) or path to JSON file
test_name: Test scenario name for tracking
expected_behavior: Expected behavior after notification arrives
Returns:
Success status
"""
# Handle different payload formats
if isinstance(payload, str):
# Check if it's a file path
payload_path = Path(payload)
if payload_path.exists():
with open(payload_path) as f:
payload_data = json.load(f)
else:
# Try to parse as JSON string
try:
payload_data = json.loads(payload)
except json.JSONDecodeError:
print(f"Error: Invalid JSON payload: {payload}")
return False
else:
payload_data = payload
# Ensure payload has aps dictionary
if "aps" not in payload_data:
payload_data = {"aps": payload_data}
# Create temp file with payload
try:
with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
json.dump(payload_data, f)
temp_payload_path = f.name
# Build simctl command
cmd = ["xcrun", "simctl", "push"]
if self.udid:
cmd.append(self.udid)
else:
cmd.append("booted")
cmd.extend([bundle_id, temp_payload_path])
# Send notification
subprocess.run(cmd, capture_output=True, text=True, check=True)
# Clean up temp file
Path(temp_payload_path).unlink()
return True
except subprocess.CalledProcessError as e:
print(f"Error sending push notification: {e.stderr}")
return False
except Exception as e:
print(f"Error: {e}")
return False
def send_simple(
self,
bundle_id: str,
title: str | None = None,
body: str | None = None,
badge: int | None = None,
sound: bool = True,
) -> bool:
"""
Send simple push notification with common parameters.
Args:
bundle_id: Target app bundle ID
title: Alert title
body: Alert body
badge: Badge number
sound: Whether to play sound
Returns:
Success status
"""
payload = {}
if title or body:
alert = {}
if title:
alert["title"] = title
if body:
alert["body"] = body
payload["alert"] = alert
if badge is not None:
payload["badge"] = badge
if sound:
payload["sound"] = "default"
# Wrap in aps
full_payload = {"aps": payload}
return self.send(bundle_id, full_payload)
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description="Send simulated push notification to iOS app")
# Required
parser.add_argument(
"--bundle-id", required=True, help="Target app bundle ID (e.g., com.example.app)"
)
# Simple payload options
parser.add_argument("--title", help="Alert title (for simple notifications)")
parser.add_argument("--body", help="Alert body message")
parser.add_argument("--badge", type=int, help="Badge number")
parser.add_argument("--no-sound", action="store_true", help="Don't play notification sound")
# Custom payload
parser.add_argument(
"--payload",
help="Custom JSON payload file or inline JSON string",
)
# Test tracking
parser.add_argument("--test-name", help="Test scenario name for tracking")
parser.add_argument(
"--expected",
help="Expected behavior after notification",
)
# Device
parser.add_argument(
"--udid",
help="Device UDID (auto-detects booted simulator if not provided)",
)
args = parser.parse_args()
# Resolve UDID with auto-detection
try:
udid = resolve_udid(args.udid)
except RuntimeError as e:
print(f"Error: {e}")
sys.exit(1)
sender = PushNotificationSender(udid=udid)
# Send notification
if args.payload:
# Custom payload mode
success = sender.send(args.bundle_id, args.payload)
else:
# Simple notification mode
success = sender.send_simple(
args.bundle_id,
title=args.title,
body=args.body,
badge=args.badge,
sound=not args.no_sound,
)
if success:
# Token-efficient output
output = "Push notification sent"
if args.test_name:
output += f" (test: {args.test_name})"
print(output)
if args.expected:
print(f"Expected: {args.expected}")
print()
print("Notification details:")
if args.title:
print(f" Title: {args.title}")
if args.body:
print(f" Body: {args.body}")
if args.badge:
print(f" Badge: {args.badge}")
print()
print("Verify notification handling:")
print("1. Check app log output: python scripts/log_monitor.py --app " + args.bundle_id)
print(
"2. Capture state: python scripts/app_state_capture.py --app-bundle-id "
+ args.bundle_id
)
else:
print("Failed to send push notification")
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,292 @@
#!/usr/bin/env python3
"""
iOS Screen Mapper - Current Screen Analyzer
Maps the current screen's UI elements for navigation decisions.
Provides token-efficient summaries of available interactions.
This script analyzes the iOS simulator screen using IDB's accessibility tree
and provides a compact, actionable summary of what's currently visible and
interactive on the screen. Perfect for AI agents making navigation decisions.
Key Features:
- Token-efficient output (5-7 lines by default)
- Identifies buttons, text fields, navigation elements
- Counts interactive and focusable elements
- Progressive detail with --verbose flag
- Navigation hints with --hints flag
Usage Examples:
# Quick summary (default)
python scripts/screen_mapper.py --udid <device-id>
# Detailed element breakdown
python scripts/screen_mapper.py --udid <device-id> --verbose
# Include navigation suggestions
python scripts/screen_mapper.py --udid <device-id> --hints
# Full JSON output for parsing
python scripts/screen_mapper.py --udid <device-id> --json
Output Format (default):
Screen: LoginViewController (45 elements, 7 interactive)
Buttons: "Login", "Cancel", "Forgot Password"
TextFields: 2 (0 filled)
Navigation: NavBar: "Sign In"
Focusable: 7 elements
Technical Details:
- Uses IDB's accessibility tree via `idb ui describe-all --json --nested`
- Parses IDB's array format: [{ root element with children }]
- Identifies element types: Button, TextField, NavigationBar, TabBar, etc.
- Extracts labels from AXLabel, AXValue, and AXUniqueId fields
"""
import argparse
import json
import subprocess
import sys
from collections import defaultdict
from common import get_accessibility_tree, resolve_udid
class ScreenMapper:
"""
Analyzes current screen for navigation decisions.
This class fetches the iOS accessibility tree from IDB and analyzes it
to provide actionable summaries for navigation. It categorizes elements
by type, counts interactive elements, and identifies key UI patterns.
Attributes:
udid (Optional[str]): Device UDID to target, or None for booted device
INTERACTIVE_TYPES (Set[str]): Element types that users can interact with
Design Philosophy:
- Token efficiency: Provide minimal but complete information
- Progressive disclosure: Summary by default, details on request
- Navigation-focused: Highlight elements relevant for automation
"""
# Element types we care about for navigation
# These are the accessibility element types that indicate user interaction points
INTERACTIVE_TYPES = {
"Button",
"Link",
"TextField",
"SecureTextField",
"Cell",
"Switch",
"Slider",
"Stepper",
"SegmentedControl",
"TabBar",
"NavigationBar",
"Toolbar",
}
def __init__(self, udid: str | None = None):
"""
Initialize screen mapper.
Args:
udid: Optional device UDID. If None, uses booted simulator.
Example:
mapper = ScreenMapper(udid="656DC652-1C9F-4AB2-AD4F-F38E65976BDA")
mapper = ScreenMapper() # Uses booted device
"""
self.udid = udid
def get_accessibility_tree(self) -> dict:
"""
Fetch accessibility tree from iOS simulator via IDB.
Delegates to shared utility for consistent tree fetching across all scripts.
"""
return get_accessibility_tree(self.udid, nested=True)
def analyze_tree(self, node: dict, depth: int = 0) -> dict:
"""Analyze accessibility tree for navigation info."""
analysis = {
"elements_by_type": defaultdict(list),
"total_elements": 0,
"interactive_elements": 0,
"text_fields": [],
"buttons": [],
"navigation": {},
"screen_name": None,
"focusable": 0,
}
self._analyze_recursive(node, analysis, depth)
# Post-process for clean output
analysis["elements_by_type"] = dict(analysis["elements_by_type"])
return analysis
def _analyze_recursive(self, node: dict, analysis: dict, depth: int):
"""Recursively analyze tree nodes."""
elem_type = node.get("type")
label = node.get("AXLabel", "")
value = node.get("AXValue", "")
identifier = node.get("AXUniqueId", "")
# Count element
if elem_type:
analysis["total_elements"] += 1
# Track by type
if elem_type in self.INTERACTIVE_TYPES:
analysis["interactive_elements"] += 1
# Store concise info (label only, not full node)
elem_info = label or value or identifier or "Unnamed"
analysis["elements_by_type"][elem_type].append(elem_info)
# Special handling for common types
if elem_type == "Button":
analysis["buttons"].append(elem_info)
elif elem_type in ("TextField", "SecureTextField"):
analysis["text_fields"].append(
{"type": elem_type, "label": elem_info, "has_value": bool(value)}
)
elif elem_type == "NavigationBar":
analysis["navigation"]["nav_title"] = label or "Navigation"
elif elem_type == "TabBar":
# Count tab items
tab_count = len(node.get("children", []))
analysis["navigation"]["tab_count"] = tab_count
# Track focusable elements
if node.get("enabled", False) and elem_type in self.INTERACTIVE_TYPES:
analysis["focusable"] += 1
# Try to identify screen name from view controller
if not analysis["screen_name"] and identifier:
if "ViewController" in identifier or "Screen" in identifier:
analysis["screen_name"] = identifier
# Process children
for child in node.get("children", []):
self._analyze_recursive(child, analysis, depth + 1)
def format_summary(self, analysis: dict, verbose: bool = False) -> str:
"""Format analysis as token-efficient summary."""
lines = []
# Screen identification (1 line)
screen = analysis["screen_name"] or "Unknown Screen"
total = analysis["total_elements"]
interactive = analysis["interactive_elements"]
lines.append(f"Screen: {screen} ({total} elements, {interactive} interactive)")
# Buttons summary (1 line)
if analysis["buttons"]:
button_list = ", ".join(f'"{b}"' for b in analysis["buttons"][:5])
if len(analysis["buttons"]) > 5:
button_list += f" +{len(analysis['buttons']) - 5} more"
lines.append(f"Buttons: {button_list}")
# Text fields summary (1 line)
if analysis["text_fields"]:
field_count = len(analysis["text_fields"])
[f["type"] for f in analysis["text_fields"]]
filled = sum(1 for f in analysis["text_fields"] if f["has_value"])
lines.append(f"TextFields: {field_count} ({filled} filled)")
# Navigation summary (1 line)
nav_parts = []
if "nav_title" in analysis["navigation"]:
nav_parts.append(f"NavBar: \"{analysis['navigation']['nav_title']}\"")
if "tab_count" in analysis["navigation"]:
nav_parts.append(f"TabBar: {analysis['navigation']['tab_count']} tabs")
if nav_parts:
lines.append(f"Navigation: {', '.join(nav_parts)}")
# Focusable count (1 line)
lines.append(f"Focusable: {analysis['focusable']} elements")
# Verbose mode adds element type breakdown
if verbose:
lines.append("\nElements by type:")
for elem_type, items in analysis["elements_by_type"].items():
if items: # Only show types that exist
lines.append(f" {elem_type}: {len(items)}")
for item in items[:3]: # Show first 3
lines.append(f" - {item}")
if len(items) > 3:
lines.append(f" ... +{len(items) - 3} more")
return "\n".join(lines)
def get_navigation_hints(self, analysis: dict) -> list[str]:
"""Generate navigation hints based on screen analysis."""
hints = []
# Check for common patterns
if "Login" in str(analysis.get("buttons", [])):
hints.append("Login screen detected - find TextFields for credentials")
if analysis["text_fields"]:
unfilled = [f for f in analysis["text_fields"] if not f["has_value"]]
if unfilled:
hints.append(f"{len(unfilled)} empty text field(s) - may need input")
if not analysis["buttons"] and not analysis["text_fields"]:
hints.append("No interactive elements - try swiping or going back")
if "tab_count" in analysis.get("navigation", {}):
hints.append(f"Tab bar available with {analysis['navigation']['tab_count']} tabs")
return hints
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description="Map current screen UI elements")
parser.add_argument("--verbose", action="store_true", help="Show detailed element breakdown")
parser.add_argument("--json", action="store_true", help="Output raw JSON analysis")
parser.add_argument("--hints", action="store_true", help="Include navigation hints")
parser.add_argument(
"--udid",
help="Device UDID (auto-detects booted simulator if not provided)",
)
args = parser.parse_args()
# Resolve UDID with auto-detection
try:
udid = resolve_udid(args.udid)
except RuntimeError as e:
print(f"Error: {e}")
sys.exit(1)
# Create mapper and analyze
mapper = ScreenMapper(udid=udid)
tree = mapper.get_accessibility_tree()
analysis = mapper.analyze_tree(tree)
# Output based on format
if args.json:
# Full JSON (verbose)
print(json.dumps(analysis, indent=2, default=str))
else:
# Token-efficient summary (default)
summary = mapper.format_summary(analysis, verbose=args.verbose)
print(summary)
# Add hints if requested
if args.hints:
hints = mapper.get_navigation_hints(analysis)
if hints:
print("\nHints:")
for hint in hints:
print(f" - {hint}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,239 @@
#!/usr/bin/env bash
#
# iOS Simulator Testing Environment Health Check
#
# Verifies that all required tools and dependencies are properly installed
# and configured for iOS simulator testing.
#
# Usage: bash scripts/sim_health_check.sh [--help]
set -e
# Color codes for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# Check flags
SHOW_HELP=false
# Parse arguments
for arg in "$@"; do
case $arg in
--help|-h)
SHOW_HELP=true
shift
;;
esac
done
if [ "$SHOW_HELP" = true ]; then
cat <<EOF
iOS Simulator Testing - Environment Health Check
Verifies that your environment is properly configured for iOS simulator testing.
Usage: bash scripts/sim_health_check.sh [options]
Options:
--help, -h Show this help message
This script checks for:
- Xcode Command Line Tools installation
- iOS Simulator availability
- IDB (iOS Development Bridge) installation
- Available simulator devices
- Python 3 installation (for scripts)
Exit codes:
0 - All checks passed
1 - One or more checks failed (see output for details)
EOF
exit 0
fi
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
echo -e "${BLUE} iOS Simulator Testing - Environment Health Check${NC}"
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
echo ""
CHECKS_PASSED=0
CHECKS_FAILED=0
# Function to print check status
check_passed() {
echo -e "${GREEN}${NC} $1"
((CHECKS_PASSED++))
}
check_failed() {
echo -e "${RED}${NC} $1"
((CHECKS_FAILED++))
}
check_warning() {
echo -e "${YELLOW}${NC} $1"
}
# Check 1: macOS
echo -e "${BLUE}[1/8]${NC} Checking operating system..."
if [[ "$OSTYPE" == "darwin"* ]]; then
OS_VERSION=$(sw_vers -productVersion)
check_passed "macOS detected (version $OS_VERSION)"
else
check_failed "Not running on macOS (detected: $OSTYPE)"
echo " iOS Simulator testing requires macOS"
fi
echo ""
# Check 2: Xcode Command Line Tools
echo -e "${BLUE}[2/8]${NC} Checking Xcode Command Line Tools..."
if command -v xcrun &> /dev/null; then
XCODE_PATH=$(xcode-select -p 2>/dev/null || echo "not found")
if [ "$XCODE_PATH" != "not found" ]; then
XCODE_VERSION=$(xcodebuild -version 2>/dev/null | head -n 1 || echo "Unknown")
check_passed "Xcode Command Line Tools installed"
echo " Path: $XCODE_PATH"
echo " Version: $XCODE_VERSION"
else
check_failed "Xcode Command Line Tools path not set"
echo " Run: xcode-select --install"
fi
else
check_failed "xcrun command not found"
echo " Install Xcode Command Line Tools: xcode-select --install"
fi
echo ""
# Check 3: simctl availability
echo -e "${BLUE}[3/8]${NC} Checking simctl (Simulator Control)..."
if command -v xcrun &> /dev/null && xcrun simctl help &> /dev/null; then
check_passed "simctl is available"
else
check_failed "simctl not available"
echo " simctl comes with Xcode Command Line Tools"
fi
echo ""
# Check 4: IDB installation
echo -e "${BLUE}[4/8]${NC} Checking IDB (iOS Development Bridge)..."
if command -v idb &> /dev/null; then
IDB_PATH=$(which idb)
IDB_VERSION=$(idb --version 2>/dev/null || echo "Unknown")
check_passed "IDB is installed"
echo " Path: $IDB_PATH"
echo " Version: $IDB_VERSION"
else
check_warning "IDB not found in PATH"
echo " IDB is optional but provides advanced UI automation"
echo " Install: https://fbidb.io/docs/installation"
echo " Recommended: brew tap facebook/fb && brew install idb-companion"
fi
echo ""
# Check 5: Python 3 installation
echo -e "${BLUE}[5/8]${NC} Checking Python 3..."
if command -v python3 &> /dev/null; then
PYTHON_VERSION=$(python3 --version | cut -d' ' -f2)
check_passed "Python 3 is installed (version $PYTHON_VERSION)"
else
check_failed "Python 3 not found"
echo " Python 3 is required for testing scripts"
echo " Install: brew install python3"
fi
echo ""
# Check 6: Available simulators
echo -e "${BLUE}[6/8]${NC} Checking available iOS Simulators..."
if command -v xcrun &> /dev/null; then
SIMULATOR_COUNT=$(xcrun simctl list devices available 2>/dev/null | grep -c "iPhone\|iPad" || echo "0")
if [ "$SIMULATOR_COUNT" -gt 0 ]; then
check_passed "Found $SIMULATOR_COUNT available simulator(s)"
# Show first 5 simulators
echo ""
echo " Available simulators (showing up to 5):"
xcrun simctl list devices available 2>/dev/null | grep "iPhone\|iPad" | head -5 | while read -r line; do
echo " - $line"
done
else
check_warning "No simulators found"
echo " Create simulators via Xcode or simctl"
echo " Example: xcrun simctl create 'iPhone 15' 'iPhone 15'"
fi
else
check_failed "Cannot check simulators (simctl not available)"
fi
echo ""
# Check 7: Booted simulators
echo -e "${BLUE}[7/8]${NC} Checking booted simulators..."
if command -v xcrun &> /dev/null; then
BOOTED_SIMS=$(xcrun simctl list devices booted 2>/dev/null | grep -c "iPhone\|iPad" || echo "0")
if [ "$BOOTED_SIMS" -gt 0 ]; then
check_passed "$BOOTED_SIMS simulator(s) currently booted"
echo ""
echo " Booted simulators:"
xcrun simctl list devices booted 2>/dev/null | grep "iPhone\|iPad" | while read -r line; do
echo " - $line"
done
else
check_warning "No simulators currently booted"
echo " Boot a simulator to begin testing"
echo " Example: xcrun simctl boot <device-udid>"
echo " Or: open -a Simulator"
fi
else
check_failed "Cannot check booted simulators (simctl not available)"
fi
echo ""
# Check 8: Required Python packages (optional check)
echo -e "${BLUE}[8/8]${NC} Checking Python packages..."
if command -v python3 &> /dev/null; then
MISSING_PACKAGES=()
# Check for PIL/Pillow (for visual_diff.py)
if python3 -c "import PIL" 2>/dev/null; then
check_passed "Pillow (PIL) installed - visual diff available"
else
MISSING_PACKAGES+=("pillow")
check_warning "Pillow (PIL) not installed - visual diff won't work"
fi
if [ ${#MISSING_PACKAGES[@]} -gt 0 ]; then
echo ""
echo " Install missing packages:"
echo " pip3 install ${MISSING_PACKAGES[*]}"
fi
else
check_warning "Cannot check Python packages (Python 3 not available)"
fi
echo ""
# Summary
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
echo -e "${BLUE} Summary${NC}"
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
echo ""
echo -e "Checks passed: ${GREEN}$CHECKS_PASSED${NC}"
if [ "$CHECKS_FAILED" -gt 0 ]; then
echo -e "Checks failed: ${RED}$CHECKS_FAILED${NC}"
echo ""
echo -e "${YELLOW}Action required:${NC} Fix the failed checks above before testing"
exit 1
else
echo ""
echo -e "${GREEN}✓ Environment is ready for iOS simulator testing${NC}"
echo ""
echo "Next steps:"
echo " 1. Boot a simulator: open -a Simulator"
echo " 2. Launch your app: xcrun simctl launch booted <bundle-id>"
echo " 3. Run accessibility audit: python scripts/accessibility_audit.py"
exit 0
fi

View File

@@ -0,0 +1,299 @@
#!/usr/bin/env python3
"""
iOS Simulator Listing with Progressive Disclosure
Lists available simulators with token-efficient summaries.
Full details available on demand via cache IDs.
Achieves 96% token reduction (57k→2k tokens) for common queries.
Usage Examples:
# Concise summary (default)
python scripts/sim_list.py
# Get full details for cached list
python scripts/sim_list.py --get-details <cache-id>
# Get recommendations
python scripts/sim_list.py --suggest
# Filter by device type
python scripts/sim_list.py --device-type iPhone
Output (default):
Simulator Summary [cache-sim-20251028-143052]
├─ Total: 47 devices
├─ Available: 31
└─ Booted: 1
✓ iPhone 16 Pro (iOS 18.1) [ABC-123...]
Use --get-details cache-sim-20251028-143052 for full list
Technical Details:
- Uses xcrun simctl list devices
- Caches results with 1-hour TTL
- Reduces output by 96% by default
- Token efficiency: summary = ~30 tokens, full list = ~1500 tokens
"""
import argparse
import json
import subprocess
import sys
from typing import Any
from common import get_cache
class SimulatorLister:
"""Lists iOS simulators with progressive disclosure."""
def __init__(self):
"""Initialize lister with cache."""
self.cache = get_cache()
def list_simulators(self) -> dict:
"""
Get list of all simulators.
Returns:
Dict with structure:
{
"devices": [...],
"runtimes": [...],
"total_devices": int,
"available_devices": int,
"booted_devices": [...]
}
"""
try:
result = subprocess.run(
["xcrun", "simctl", "list", "devices", "--json"],
capture_output=True,
text=True,
check=True,
)
return json.loads(result.stdout)
except (subprocess.CalledProcessError, json.JSONDecodeError):
return {"devices": {}, "runtimes": []}
def parse_devices(self, sim_data: dict) -> list[dict]:
"""
Parse simulator data into flat list.
Returns:
List of device dicts with runtime info
"""
devices = []
devices_by_runtime = sim_data.get("devices", {})
for runtime_str, device_list in devices_by_runtime.items():
# Extract iOS version from runtime string
# Format: "iOS 18.1", "tvOS 18", etc.
runtime_name = runtime_str.replace(" Simulator", "").strip()
for device in device_list:
devices.append(
{
"name": device.get("name"),
"udid": device.get("udid"),
"state": device.get("state"),
"runtime": runtime_name,
"is_available": device.get("isAvailable", False),
}
)
return devices
def get_concise_summary(self, devices: list[dict]) -> dict:
"""
Generate concise summary with cache ID.
Returns 96% fewer tokens than full list.
"""
booted = [d for d in devices if d["state"] == "Booted"]
available = [d for d in devices if d["is_available"]]
iphone = [d for d in available if "iPhone" in d["name"]]
# Cache full list for later retrieval
cache_id = self.cache.save(
{
"devices": devices,
"timestamp": __import__("datetime").datetime.now().isoformat(),
},
"simulator-list",
)
return {
"cache_id": cache_id,
"summary": {
"total_devices": len(devices),
"available_devices": len(available),
"booted_devices": len(booted),
},
"quick_access": {
"booted": booted[:3] if booted else [],
"recommended_iphone": iphone[:3] if iphone else [],
},
}
def get_full_list(
self,
cache_id: str,
device_type: str | None = None,
runtime: str | None = None,
) -> list[dict] | None:
"""
Retrieve full simulator list from cache.
Args:
cache_id: Cache ID from concise summary
device_type: Filter by type (iPhone, iPad, etc.)
runtime: Filter by iOS version
Returns:
List of devices matching filters
"""
data = self.cache.get(cache_id)
if not data:
return None
devices = data.get("devices", [])
# Apply filters
if device_type:
devices = [d for d in devices if device_type in d["name"]]
if runtime:
devices = [d for d in devices if runtime.lower() in d["runtime"].lower()]
return devices
def suggest_simulators(self, limit: int = 4) -> list[dict]:
"""
Get simulator recommendations.
Returns:
List of recommended simulators (best candidates for building)
"""
all_sims = self.list_simulators()
devices = self.parse_devices(all_sims)
# Score devices for recommendations
scored = []
for device in devices:
score = 0
# Prefer booted
if device["state"] == "Booted":
score += 10
# Prefer available
if device["is_available"]:
score += 5
# Prefer recent iOS versions
ios_version = device["runtime"]
if "18" in ios_version:
score += 3
elif "17" in ios_version:
score += 2
# Prefer iPhones over other types
if "iPhone" in device["name"]:
score += 1
scored.append({"device": device, "score": score})
# Sort by score and return top N
scored.sort(key=lambda x: x["score"], reverse=True)
return [s["device"] for s in scored[:limit]]
def format_device(device: dict) -> str:
"""Format device for display."""
state_icon = "" if device["state"] == "Booted" else " "
avail_icon = "" if device["is_available"] else ""
name = device["name"]
runtime = device["runtime"]
udid_short = device["udid"][:8] + "..."
return f"{state_icon} {avail_icon} {name} ({runtime}) [{udid_short}]"
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description="List iOS simulators with progressive disclosure")
parser.add_argument(
"--get-details",
metavar="CACHE_ID",
help="Get full details for cached simulator list",
)
parser.add_argument("--suggest", action="store_true", help="Get simulator recommendations")
parser.add_argument(
"--device-type",
help="Filter by device type (iPhone, iPad, Apple Watch, etc.)",
)
parser.add_argument("--runtime", help="Filter by iOS version (e.g., iOS-18, iOS-17)")
parser.add_argument("--json", action="store_true", help="Output as JSON")
args = parser.parse_args()
lister = SimulatorLister()
# Get full list with details
if args.get_details:
devices = lister.get_full_list(
args.get_details, device_type=args.device_type, runtime=args.runtime
)
if devices is None:
print(f"Error: Cache ID not found or expired: {args.get_details}")
sys.exit(1)
if args.json:
print(json.dumps(devices, indent=2))
else:
print(f"Simulators ({len(devices)}):\n")
for device in devices:
print(f" {format_device(device)}")
# Get recommendations
elif args.suggest:
suggestions = lister.suggest_simulators()
if args.json:
print(json.dumps(suggestions, indent=2))
else:
print("Recommended Simulators:\n")
for i, device in enumerate(suggestions, 1):
print(f"{i}. {format_device(device)}")
# Default: concise summary
else:
all_sims = lister.list_simulators()
devices = lister.parse_devices(all_sims)
summary = lister.get_concise_summary(devices)
if args.json:
print(json.dumps(summary, indent=2))
else:
# Human-readable concise output
cache_id = summary["cache_id"]
s = summary["summary"]
q = summary["quick_access"]
print(f"Simulator Summary [{cache_id}]")
print(f"├─ Total: {s['total_devices']} devices")
print(f"├─ Available: {s['available_devices']}")
print(f"└─ Booted: {s['booted_devices']}")
if q["booted"]:
print()
for device in q["booted"]:
print(f" {format_device(device)}")
print()
print(f"Use --get-details {cache_id} for full list")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,297 @@
#!/usr/bin/env python3
"""
Boot iOS simulators and wait for readiness.
This script boots one or more simulators and optionally waits for them to reach
a ready state. It measures boot time and provides progress feedback.
Key features:
- Boot by UDID or device name
- Wait for device readiness with configurable timeout
- Measure boot performance
- Batch boot operations (boot all, boot by type)
- Progress reporting for CI/CD pipelines
"""
import argparse
import subprocess
import sys
import time
from typing import Optional
from common.device_utils import (
get_booted_device_udid,
list_simulators,
resolve_device_identifier,
)
class SimulatorBooter:
"""Boot iOS simulators with optional readiness waiting."""
def __init__(self, udid: str | None = None):
"""Initialize booter with optional device UDID."""
self.udid = udid
def boot(self, wait_ready: bool = False, timeout_seconds: int = 120) -> tuple[bool, str]:
"""
Boot simulator and optionally wait for readiness.
Args:
wait_ready: Wait for device to be ready before returning
timeout_seconds: Maximum seconds to wait for readiness
Returns:
(success, message) tuple
"""
if not self.udid:
return False, "Error: Device UDID not specified"
start_time = time.time()
# Check if already booted
try:
booted = get_booted_device_udid()
if booted == self.udid:
elapsed = time.time() - start_time
return True, (f"Device already booted: {self.udid} " f"[checked in {elapsed:.1f}s]")
except RuntimeError:
pass # No booted device, proceed with boot
# Execute boot command
try:
cmd = ["xcrun", "simctl", "boot", self.udid]
result = subprocess.run(cmd, check=False, capture_output=True, text=True, timeout=30)
if result.returncode != 0:
error = result.stderr.strip()
return False, f"Boot failed: {error}"
except subprocess.TimeoutExpired:
return False, "Boot command timed out"
except Exception as e:
return False, f"Boot error: {e}"
# Optionally wait for readiness
if wait_ready:
ready, wait_message = self._wait_for_ready(timeout_seconds)
elapsed = time.time() - start_time
if ready:
return True, (f"Device booted and ready: {self.udid} " f"[{elapsed:.1f}s total]")
return False, wait_message
elapsed = time.time() - start_time
return True, (
f"Device booted: {self.udid} [boot in {elapsed:.1f}s] "
"(use --wait-ready to wait for availability)"
)
def _wait_for_ready(self, timeout_seconds: int = 120) -> tuple[bool, str]:
"""
Wait for device to reach ready state.
Args:
timeout_seconds: Maximum seconds to wait
Returns:
(success, message) tuple
"""
start_time = time.time()
poll_interval = 0.5
checks = 0
while time.time() - start_time < timeout_seconds:
try:
checks += 1
# Check if device responds to simctl commands
result = subprocess.run(
["xcrun", "simctl", "spawn", self.udid, "launchctl", "list"],
check=False,
capture_output=True,
text=True,
timeout=5,
)
if result.returncode == 0:
elapsed = time.time() - start_time
return True, (
f"Device ready: {self.udid} " f"[{elapsed:.1f}s, {checks} checks]"
)
except (subprocess.TimeoutExpired, RuntimeError):
pass # Not ready yet
time.sleep(poll_interval)
elapsed = time.time() - start_time
return False, (
f"Boot timeout: Device did not reach ready state "
f"within {elapsed:.1f}s ({checks} checks)"
)
@staticmethod
def boot_all() -> tuple[int, int]:
"""
Boot all available simulators.
Returns:
(succeeded, failed) tuple with counts
"""
simulators = list_simulators(state="available")
succeeded = 0
failed = 0
for sim in simulators:
booter = SimulatorBooter(udid=sim["udid"])
success, _message = booter.boot(wait_ready=False)
if success:
succeeded += 1
else:
failed += 1
return succeeded, failed
@staticmethod
def boot_by_type(device_type: str) -> tuple[int, int]:
"""
Boot all simulators of a specific type.
Args:
device_type: Device type filter (e.g., "iPhone", "iPad")
Returns:
(succeeded, failed) tuple with counts
"""
simulators = list_simulators(state="available")
succeeded = 0
failed = 0
for sim in simulators:
if device_type.lower() in sim["name"].lower():
booter = SimulatorBooter(udid=sim["udid"])
success, _message = booter.boot(wait_ready=False)
if success:
succeeded += 1
else:
failed += 1
return succeeded, failed
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description="Boot iOS simulators and wait for readiness")
parser.add_argument(
"--udid",
help="Device UDID or name (required unless using --all or --type)",
)
parser.add_argument(
"--name",
help="Device name (alternative to --udid)",
)
parser.add_argument(
"--wait-ready",
action="store_true",
help="Wait for device to reach ready state",
)
parser.add_argument(
"--timeout",
type=int,
default=120,
help="Timeout for --wait-ready in seconds (default: 120)",
)
parser.add_argument(
"--all",
action="store_true",
help="Boot all available simulators",
)
parser.add_argument(
"--type",
help="Boot all simulators of a specific type (e.g., iPhone, iPad)",
)
parser.add_argument(
"--json",
action="store_true",
help="Output as JSON",
)
args = parser.parse_args()
# Handle batch operations
if args.all:
succeeded, failed = SimulatorBooter.boot_all()
if args.json:
import json
print(
json.dumps(
{
"action": "boot_all",
"succeeded": succeeded,
"failed": failed,
"total": succeeded + failed,
}
)
)
else:
total = succeeded + failed
print(f"Boot summary: {succeeded}/{total} succeeded, " f"{failed} failed")
sys.exit(0 if failed == 0 else 1)
if args.type:
succeeded, failed = SimulatorBooter.boot_by_type(args.type)
if args.json:
import json
print(
json.dumps(
{
"action": "boot_by_type",
"type": args.type,
"succeeded": succeeded,
"failed": failed,
"total": succeeded + failed,
}
)
)
else:
total = succeeded + failed
print(f"Boot {args.type} summary: {succeeded}/{total} succeeded, " f"{failed} failed")
sys.exit(0 if failed == 0 else 1)
# Resolve device identifier
device_id = args.udid or args.name
if not device_id:
print("Error: Specify --udid, --name, --all, or --type", file=sys.stderr)
sys.exit(1)
try:
udid = resolve_device_identifier(device_id)
except RuntimeError as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
# Boot device
booter = SimulatorBooter(udid=udid)
success, message = booter.boot(wait_ready=args.wait_ready, timeout_seconds=args.timeout)
if args.json:
import json
print(
json.dumps(
{
"action": "boot",
"device_id": device_id,
"udid": udid,
"success": success,
"message": message,
}
)
)
else:
print(message)
sys.exit(0 if success else 1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,316 @@
#!/usr/bin/env python3
"""
Create iOS simulators dynamically.
This script creates new simulators with specified device type and iOS version.
Useful for CI/CD pipelines that need on-demand test device provisioning.
Key features:
- Create by device type (iPhone 16 Pro, iPad Air, etc.)
- Specify iOS version (17.0, 18.0, etc.)
- Custom device naming
- Return newly created device UDID
- List available device types and runtimes
"""
import argparse
import subprocess
import sys
from typing import Optional
from common.device_utils import list_simulators
class SimulatorCreator:
"""Create iOS simulators with specified configurations."""
def __init__(self):
"""Initialize simulator creator."""
pass
def create(
self,
device_type: str,
ios_version: str | None = None,
custom_name: str | None = None,
) -> tuple[bool, str, str | None]:
"""
Create new iOS simulator.
Args:
device_type: Device type (e.g., "iPhone 16 Pro", "iPad Air")
ios_version: iOS version (e.g., "18.0"). If None, uses latest.
custom_name: Custom device name. If None, uses default.
Returns:
(success, message, new_udid) tuple
"""
# Get available device types and runtimes
available_types = self._get_device_types()
if not available_types:
return False, "Failed to get available device types", None
# Normalize device type
device_type_id = None
for dt in available_types:
if device_type.lower() in dt["name"].lower():
device_type_id = dt["identifier"]
break
if not device_type_id:
return (
False,
f"Device type '{device_type}' not found. "
f"Use --list-devices for available types.",
None,
)
# Get available runtimes
available_runtimes = self._get_runtimes()
if not available_runtimes:
return False, "Failed to get available runtimes", None
# Resolve iOS version
runtime_id = None
if ios_version:
for rt in available_runtimes:
if ios_version in rt["name"]:
runtime_id = rt["identifier"]
break
if not runtime_id:
return (
False,
f"iOS version '{ios_version}' not found. "
f"Use --list-runtimes for available versions.",
None,
)
# Use latest runtime
elif available_runtimes:
runtime_id = available_runtimes[-1]["identifier"]
if not runtime_id:
return False, "No iOS runtime available", None
# Create device
try:
# Build device name
device_name = (
custom_name or f"{device_type_id.split('.')[-1]}-{ios_version or 'latest'}"
)
cmd = [
"xcrun",
"simctl",
"create",
device_name,
device_type_id,
runtime_id,
]
result = subprocess.run(cmd, check=False, capture_output=True, text=True, timeout=60)
if result.returncode != 0:
error = result.stderr.strip() or result.stdout.strip()
return False, f"Creation failed: {error}", None
# Extract UDID from output
new_udid = result.stdout.strip()
return (
True,
f"Device created: {device_name} ({device_type}) iOS {ios_version or 'latest'} "
f"UDID: {new_udid}",
new_udid,
)
except subprocess.TimeoutExpired:
return False, "Creation command timed out", None
except Exception as e:
return False, f"Creation error: {e}", None
@staticmethod
def _get_device_types() -> list[dict]:
"""
Get available device types.
Returns:
List of device type dicts with "name" and "identifier" keys
"""
try:
cmd = ["xcrun", "simctl", "list", "devicetypes", "-j"]
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
import json
data = json.loads(result.stdout)
devices = []
for device in data.get("devicetypes", []):
devices.append(
{
"name": device.get("name", ""),
"identifier": device.get("identifier", ""),
}
)
return devices
except Exception:
return []
@staticmethod
def _get_runtimes() -> list[dict]:
"""
Get available iOS runtimes.
Returns:
List of runtime dicts with "name" and "identifier" keys
"""
try:
cmd = ["xcrun", "simctl", "list", "runtimes", "-j"]
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
import json
data = json.loads(result.stdout)
runtimes = []
for runtime in data.get("runtimes", []):
# Only include iOS runtimes (skip watchOS, tvOS, etc.)
identifier = runtime.get("identifier", "")
if "iOS" in identifier or "iOS" in runtime.get("name", ""):
runtimes.append(
{
"name": runtime.get("name", ""),
"identifier": runtime.get("identifier", ""),
}
)
# Sort by version number (latest first)
runtimes.sort(key=lambda r: r.get("identifier", ""), reverse=True)
return runtimes
except Exception:
return []
@staticmethod
def list_device_types() -> list[dict]:
"""
List all available device types.
Returns:
List of device types with name and identifier
"""
return SimulatorCreator._get_device_types()
@staticmethod
def list_runtimes() -> list[dict]:
"""
List all available iOS runtimes.
Returns:
List of runtimes with name and identifier
"""
return SimulatorCreator._get_runtimes()
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description="Create iOS simulators dynamically")
parser.add_argument(
"--device",
required=False,
help="Device type (e.g., 'iPhone 16 Pro', 'iPad Air')",
)
parser.add_argument(
"--runtime",
help="iOS version (e.g., '18.0', '17.0'). Defaults to latest.",
)
parser.add_argument(
"--name",
help="Custom device name. Defaults to auto-generated.",
)
parser.add_argument(
"--list-devices",
action="store_true",
help="List all available device types",
)
parser.add_argument(
"--list-runtimes",
action="store_true",
help="List all available iOS runtimes",
)
parser.add_argument(
"--json",
action="store_true",
help="Output as JSON",
)
args = parser.parse_args()
creator = SimulatorCreator()
# Handle info queries
if args.list_devices:
devices = creator.list_device_types()
if args.json:
import json
print(json.dumps({"devices": devices}))
else:
print(f"Available device types ({len(devices)}):")
for dev in devices[:20]: # Show first 20
print(f" - {dev['name']}")
if len(devices) > 20:
print(f" ... and {len(devices) - 20} more")
sys.exit(0)
if args.list_runtimes:
runtimes = creator.list_runtimes()
if args.json:
import json
print(json.dumps({"runtimes": runtimes}))
else:
print(f"Available iOS runtimes ({len(runtimes)}):")
for rt in runtimes:
print(f" - {rt['name']}")
sys.exit(0)
# Create device
if not args.device:
print(
"Error: Specify --device, --list-devices, or --list-runtimes",
file=sys.stderr,
)
sys.exit(1)
success, message, new_udid = creator.create(
device_type=args.device,
ios_version=args.runtime,
custom_name=args.name,
)
if args.json:
import json
print(
json.dumps(
{
"action": "create",
"device_type": args.device,
"runtime": args.runtime,
"success": success,
"message": message,
"new_udid": new_udid,
}
)
)
else:
print(message)
sys.exit(0 if success else 1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,357 @@
#!/usr/bin/env python3
"""
Delete iOS simulators permanently.
This script permanently removes simulators and frees disk space.
Includes safety confirmation to prevent accidental deletion.
Key features:
- Delete by UDID or device name
- Confirmation required for safety
- Batch delete operations
- Report freed disk space estimate
"""
import argparse
import subprocess
import sys
from typing import Optional
from common.device_utils import (
list_simulators,
resolve_device_identifier,
)
class SimulatorDeleter:
"""Delete iOS simulators with safety confirmation."""
def __init__(self, udid: str | None = None):
"""Initialize with optional device UDID."""
self.udid = udid
def delete(self, confirm: bool = False) -> tuple[bool, str]:
"""
Delete simulator permanently.
Args:
confirm: Skip confirmation prompt (for batch operations)
Returns:
(success, message) tuple
"""
if not self.udid:
return False, "Error: Device UDID not specified"
# Safety confirmation
if not confirm:
try:
response = input(
f"Permanently delete simulator {self.udid}? " f"(type 'yes' to confirm): "
)
if response.lower() != "yes":
return False, "Deletion cancelled by user"
except KeyboardInterrupt:
return False, "Deletion cancelled"
# Execute delete command
try:
cmd = ["xcrun", "simctl", "delete", self.udid]
result = subprocess.run(cmd, check=False, capture_output=True, text=True, timeout=60)
if result.returncode != 0:
error = result.stderr.strip() or result.stdout.strip()
return False, f"Deletion failed: {error}"
return True, f"Device deleted: {self.udid} [disk space freed]"
except subprocess.TimeoutExpired:
return False, "Deletion command timed out"
except Exception as e:
return False, f"Deletion error: {e}"
@staticmethod
def delete_all(confirm: bool = False) -> tuple[int, int]:
"""
Delete all simulators permanently.
Args:
confirm: Skip confirmation prompt
Returns:
(succeeded, failed) tuple with counts
"""
simulators = list_simulators(state=None)
if not confirm:
count = len(simulators)
try:
response = input(
f"Permanently delete ALL {count} simulators? " f"(type 'yes' to confirm): "
)
if response.lower() != "yes":
return 0, count
except KeyboardInterrupt:
return 0, count
succeeded = 0
failed = 0
for sim in simulators:
deleter = SimulatorDeleter(udid=sim["udid"])
success, _message = deleter.delete(confirm=True)
if success:
succeeded += 1
else:
failed += 1
return succeeded, failed
@staticmethod
def delete_by_type(device_type: str, confirm: bool = False) -> tuple[int, int]:
"""
Delete all simulators of a specific type.
Args:
device_type: Device type filter (e.g., "iPhone", "iPad")
confirm: Skip confirmation prompt
Returns:
(succeeded, failed) tuple with counts
"""
simulators = list_simulators(state=None)
matching = [s for s in simulators if device_type.lower() in s["name"].lower()]
if not matching:
return 0, 0
if not confirm:
count = len(matching)
try:
response = input(
f"Permanently delete {count} {device_type} simulators? "
f"(type 'yes' to confirm): "
)
if response.lower() != "yes":
return 0, count
except KeyboardInterrupt:
return 0, count
succeeded = 0
failed = 0
for sim in matching:
deleter = SimulatorDeleter(udid=sim["udid"])
success, _message = deleter.delete(confirm=True)
if success:
succeeded += 1
else:
failed += 1
return succeeded, failed
@staticmethod
def delete_old(keep_count: int = 3, confirm: bool = False) -> tuple[int, int]:
"""
Delete older simulators, keeping most recent versions.
Useful for cleanup after testing multiple iOS versions.
Keeps the most recent N simulators of each type.
Args:
keep_count: Number of recent simulators to keep per type (default: 3)
confirm: Skip confirmation prompt
Returns:
(succeeded, failed) tuple with counts
"""
simulators = list_simulators(state=None)
# Group by device type
by_type: dict[str, list] = {}
for sim in simulators:
dev_type = sim["type"]
if dev_type not in by_type:
by_type[dev_type] = []
by_type[dev_type].append(sim)
# Find candidates for deletion (older ones)
to_delete = []
for _dev_type, sims in by_type.items():
# Sort by runtime (iOS version) - keep newest
sorted_sims = sorted(sims, key=lambda s: s["runtime"], reverse=True)
# Mark older ones for deletion
to_delete.extend(sorted_sims[keep_count:])
if not to_delete:
return 0, 0
if not confirm:
count = len(to_delete)
try:
response = input(
f"Delete {count} older simulators, keeping {keep_count} per type? "
f"(type 'yes' to confirm): "
)
if response.lower() != "yes":
return 0, count
except KeyboardInterrupt:
return 0, count
succeeded = 0
failed = 0
for sim in to_delete:
deleter = SimulatorDeleter(udid=sim["udid"])
success, _message = deleter.delete(confirm=True)
if success:
succeeded += 1
else:
failed += 1
return succeeded, failed
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description="Delete iOS simulators permanently")
parser.add_argument(
"--udid",
help="Device UDID or name (required unless using batch options)",
)
parser.add_argument(
"--name",
help="Device name (alternative to --udid)",
)
parser.add_argument(
"--yes",
action="store_true",
help="Skip confirmation prompt",
)
parser.add_argument(
"--all",
action="store_true",
help="Delete all simulators",
)
parser.add_argument(
"--type",
help="Delete all simulators of a specific type (e.g., iPhone)",
)
parser.add_argument(
"--old",
type=int,
metavar="KEEP_COUNT",
help="Delete older simulators, keeping this many per type (e.g., --old 3)",
)
parser.add_argument(
"--json",
action="store_true",
help="Output as JSON",
)
args = parser.parse_args()
# Handle batch operations
if args.all:
succeeded, failed = SimulatorDeleter.delete_all(confirm=args.yes)
if args.json:
import json
print(
json.dumps(
{
"action": "delete_all",
"succeeded": succeeded,
"failed": failed,
"total": succeeded + failed,
}
)
)
else:
total = succeeded + failed
print(f"Delete summary: {succeeded}/{total} succeeded, " f"{failed} failed")
sys.exit(0 if failed == 0 else 1)
if args.type:
succeeded, failed = SimulatorDeleter.delete_by_type(args.type, confirm=args.yes)
if args.json:
import json
print(
json.dumps(
{
"action": "delete_by_type",
"type": args.type,
"succeeded": succeeded,
"failed": failed,
"total": succeeded + failed,
}
)
)
else:
total = succeeded + failed
print(f"Delete {args.type} summary: {succeeded}/{total} succeeded, " f"{failed} failed")
sys.exit(0 if failed == 0 else 1)
if args.old is not None:
succeeded, failed = SimulatorDeleter.delete_old(keep_count=args.old, confirm=args.yes)
if args.json:
import json
print(
json.dumps(
{
"action": "delete_old",
"keep_count": args.old,
"succeeded": succeeded,
"failed": failed,
"total": succeeded + failed,
}
)
)
else:
total = succeeded + failed
print(
f"Delete old summary: {succeeded}/{total} succeeded, "
f"{failed} failed (kept {args.old} per type)"
)
sys.exit(0 if failed == 0 else 1)
# Delete single device
device_id = args.udid or args.name
if not device_id:
print("Error: Specify --udid, --name, --all, --type, or --old", file=sys.stderr)
sys.exit(1)
try:
udid = resolve_device_identifier(device_id)
except RuntimeError as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
# Delete device
deleter = SimulatorDeleter(udid=udid)
success, message = deleter.delete(confirm=args.yes)
if args.json:
import json
print(
json.dumps(
{
"action": "delete",
"device_id": device_id,
"udid": udid,
"success": success,
"message": message,
}
)
)
else:
print(message)
sys.exit(0 if success else 1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,342 @@
#!/usr/bin/env python3
"""
Erase iOS simulators (factory reset).
This script performs a factory reset on simulators, returning them to
a clean state while preserving the device UUID. Much faster than
delete + create for CI/CD cleanup.
Key features:
- Erase by UDID or device name
- Preserve device UUID (faster than delete)
- Verify erase completion
- Batch erase operations (all, by type)
"""
import argparse
import subprocess
import sys
import time
from typing import Optional
from common.device_utils import (
list_simulators,
resolve_device_identifier,
)
class SimulatorEraser:
"""Erase iOS simulators with optional verification."""
def __init__(self, udid: str | None = None):
"""Initialize with optional device UDID."""
self.udid = udid
def erase(self, verify: bool = True, timeout_seconds: int = 30) -> tuple[bool, str]:
"""
Erase simulator and optionally verify completion.
Performs a factory reset, clearing all app data and settings
while preserving the simulator UUID.
Args:
verify: Wait for erase to complete and verify state
timeout_seconds: Maximum seconds to wait for verification
Returns:
(success, message) tuple
"""
if not self.udid:
return False, "Error: Device UDID not specified"
start_time = time.time()
# Execute erase command
try:
cmd = ["xcrun", "simctl", "erase", self.udid]
result = subprocess.run(cmd, check=False, capture_output=True, text=True, timeout=60)
if result.returncode != 0:
error = result.stderr.strip()
return False, f"Erase failed: {error}"
except subprocess.TimeoutExpired:
return False, "Erase command timed out"
except Exception as e:
return False, f"Erase error: {e}"
# Optionally verify erase completion
if verify:
ready, verify_message = self._verify_erase(timeout_seconds)
elapsed = time.time() - start_time
if ready:
return True, (
f"Device erased: {self.udid} " f"[factory reset complete, {elapsed:.1f}s]"
)
return False, verify_message
elapsed = time.time() - start_time
return True, (
f"Device erase initiated: {self.udid} [{elapsed:.1f}s] "
"(use --verify to wait for completion)"
)
def _verify_erase(self, timeout_seconds: int = 30) -> tuple[bool, str]:
"""
Verify erase has completed.
Polls device state to confirm erase finished successfully.
Args:
timeout_seconds: Maximum seconds to wait
Returns:
(success, message) tuple
"""
start_time = time.time()
poll_interval = 0.5
checks = 0
while time.time() - start_time < timeout_seconds:
try:
checks += 1
# Check if device can be queried (indicates boot status)
result = subprocess.run(
["xcrun", "simctl", "spawn", self.udid, "launchctl", "list"],
check=False,
capture_output=True,
text=True,
timeout=5,
)
# Device responding = erase likely complete
if result.returncode == 0:
elapsed = time.time() - start_time
return True, (
f"Erase verified: {self.udid} " f"[{elapsed:.1f}s, {checks} checks]"
)
except (subprocess.TimeoutExpired, RuntimeError):
pass # Not ready yet, keep polling
time.sleep(poll_interval)
elapsed = time.time() - start_time
return False, (
f"Erase verification timeout: Device did not respond "
f"within {elapsed:.1f}s ({checks} checks)"
)
@staticmethod
def erase_all() -> tuple[int, int]:
"""
Erase all simulators (factory reset).
Returns:
(succeeded, failed) tuple with counts
"""
simulators = list_simulators(state=None)
succeeded = 0
failed = 0
for sim in simulators:
eraser = SimulatorEraser(udid=sim["udid"])
success, _message = eraser.erase(verify=False)
if success:
succeeded += 1
else:
failed += 1
return succeeded, failed
@staticmethod
def erase_by_type(device_type: str) -> tuple[int, int]:
"""
Erase all simulators of a specific type.
Args:
device_type: Device type filter (e.g., "iPhone", "iPad")
Returns:
(succeeded, failed) tuple with counts
"""
simulators = list_simulators(state=None)
succeeded = 0
failed = 0
for sim in simulators:
if device_type.lower() in sim["name"].lower():
eraser = SimulatorEraser(udid=sim["udid"])
success, _message = eraser.erase(verify=False)
if success:
succeeded += 1
else:
failed += 1
return succeeded, failed
@staticmethod
def erase_booted() -> tuple[int, int]:
"""
Erase all currently booted simulators.
Returns:
(succeeded, failed) tuple with counts
"""
simulators = list_simulators(state="booted")
succeeded = 0
failed = 0
for sim in simulators:
eraser = SimulatorEraser(udid=sim["udid"])
success, _message = eraser.erase(verify=False)
if success:
succeeded += 1
else:
failed += 1
return succeeded, failed
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description="Erase iOS simulators (factory reset)")
parser.add_argument(
"--udid",
help="Device UDID or name (required unless using --all, --type, or --booted)",
)
parser.add_argument(
"--name",
help="Device name (alternative to --udid)",
)
parser.add_argument(
"--verify",
action="store_true",
help="Wait for erase to complete and verify state",
)
parser.add_argument(
"--timeout",
type=int,
default=30,
help="Timeout for --verify in seconds (default: 30)",
)
parser.add_argument(
"--all",
action="store_true",
help="Erase all simulators (factory reset)",
)
parser.add_argument(
"--type",
help="Erase all simulators of a specific type (e.g., iPhone)",
)
parser.add_argument(
"--booted",
action="store_true",
help="Erase all currently booted simulators",
)
parser.add_argument(
"--json",
action="store_true",
help="Output as JSON",
)
args = parser.parse_args()
# Handle batch operations
if args.all:
succeeded, failed = SimulatorEraser.erase_all()
if args.json:
import json
print(
json.dumps(
{
"action": "erase_all",
"succeeded": succeeded,
"failed": failed,
"total": succeeded + failed,
}
)
)
else:
total = succeeded + failed
print(f"Erase summary: {succeeded}/{total} succeeded, " f"{failed} failed")
sys.exit(0 if failed == 0 else 1)
if args.type:
succeeded, failed = SimulatorEraser.erase_by_type(args.type)
if args.json:
import json
print(
json.dumps(
{
"action": "erase_by_type",
"type": args.type,
"succeeded": succeeded,
"failed": failed,
"total": succeeded + failed,
}
)
)
else:
total = succeeded + failed
print(f"Erase {args.type} summary: {succeeded}/{total} succeeded, " f"{failed} failed")
sys.exit(0 if failed == 0 else 1)
if args.booted:
succeeded, failed = SimulatorEraser.erase_booted()
if args.json:
import json
print(
json.dumps(
{
"action": "erase_booted",
"succeeded": succeeded,
"failed": failed,
"total": succeeded + failed,
}
)
)
else:
total = succeeded + failed
print(f"Erase booted summary: {succeeded}/{total} succeeded, " f"{failed} failed")
sys.exit(0 if failed == 0 else 1)
# Erase single device
device_id = args.udid or args.name
if not device_id:
print("Error: Specify --udid, --name, --all, --type, or --booted", file=sys.stderr)
sys.exit(1)
try:
udid = resolve_device_identifier(device_id)
except RuntimeError as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
# Erase device
eraser = SimulatorEraser(udid=udid)
success, message = eraser.erase(verify=args.verify, timeout_seconds=args.timeout)
if args.json:
import json
print(
json.dumps(
{
"action": "erase",
"device_id": device_id,
"udid": udid,
"success": success,
"message": message,
}
)
)
else:
print(message)
sys.exit(0 if success else 1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,290 @@
#!/usr/bin/env python3
"""
Shutdown iOS simulators with optional state verification.
This script shuts down one or more running simulators and optionally
verifies completion. Supports batch operations for efficient cleanup.
Key features:
- Shutdown by UDID or device name
- Verify shutdown completion with timeout
- Batch shutdown operations (all, by type)
- Progress reporting for CI/CD pipelines
"""
import argparse
import subprocess
import sys
import time
from typing import Optional
from common.device_utils import (
list_simulators,
resolve_device_identifier,
)
class SimulatorShutdown:
"""Shutdown iOS simulators with optional verification."""
def __init__(self, udid: str | None = None):
"""Initialize with optional device UDID."""
self.udid = udid
def shutdown(self, verify: bool = True, timeout_seconds: int = 30) -> tuple[bool, str]:
"""
Shutdown simulator and optionally verify completion.
Args:
verify: Wait for shutdown to complete
timeout_seconds: Maximum seconds to wait for shutdown
Returns:
(success, message) tuple
"""
if not self.udid:
return False, "Error: Device UDID not specified"
start_time = time.time()
# Check if already shutdown
simulators = list_simulators(state="booted")
if not any(s["udid"] == self.udid for s in simulators):
elapsed = time.time() - start_time
return True, (f"Device already shutdown: {self.udid} " f"[checked in {elapsed:.1f}s]")
# Execute shutdown command
try:
cmd = ["xcrun", "simctl", "shutdown", self.udid]
result = subprocess.run(cmd, check=False, capture_output=True, text=True, timeout=30)
if result.returncode != 0:
error = result.stderr.strip()
return False, f"Shutdown failed: {error}"
except subprocess.TimeoutExpired:
return False, "Shutdown command timed out"
except Exception as e:
return False, f"Shutdown error: {e}"
# Optionally verify shutdown
if verify:
ready, verify_message = self._verify_shutdown(timeout_seconds)
elapsed = time.time() - start_time
if ready:
return True, (f"Device shutdown confirmed: {self.udid} " f"[{elapsed:.1f}s total]")
return False, verify_message
elapsed = time.time() - start_time
return True, (
f"Device shutdown: {self.udid} [{elapsed:.1f}s] "
"(use --verify to wait for confirmation)"
)
def _verify_shutdown(self, timeout_seconds: int = 30) -> tuple[bool, str]:
"""
Verify device has fully shutdown.
Args:
timeout_seconds: Maximum seconds to wait
Returns:
(success, message) tuple
"""
start_time = time.time()
poll_interval = 0.5
checks = 0
while time.time() - start_time < timeout_seconds:
try:
checks += 1
# Check booted devices
simulators = list_simulators(state="booted")
if not any(s["udid"] == self.udid for s in simulators):
elapsed = time.time() - start_time
return True, (
f"Device shutdown verified: {self.udid} "
f"[{elapsed:.1f}s, {checks} checks]"
)
except RuntimeError:
pass # Error checking, retry
time.sleep(poll_interval)
elapsed = time.time() - start_time
return False, (
f"Shutdown verification timeout: Device did not fully shutdown "
f"within {elapsed:.1f}s ({checks} checks)"
)
@staticmethod
def shutdown_all() -> tuple[int, int]:
"""
Shutdown all booted simulators.
Returns:
(succeeded, failed) tuple with counts
"""
simulators = list_simulators(state="booted")
succeeded = 0
failed = 0
for sim in simulators:
shutdown = SimulatorShutdown(udid=sim["udid"])
success, _message = shutdown.shutdown(verify=False)
if success:
succeeded += 1
else:
failed += 1
return succeeded, failed
@staticmethod
def shutdown_by_type(device_type: str) -> tuple[int, int]:
"""
Shutdown all booted simulators of a specific type.
Args:
device_type: Device type filter (e.g., "iPhone", "iPad")
Returns:
(succeeded, failed) tuple with counts
"""
simulators = list_simulators(state="booted")
succeeded = 0
failed = 0
for sim in simulators:
if device_type.lower() in sim["name"].lower():
shutdown = SimulatorShutdown(udid=sim["udid"])
success, _message = shutdown.shutdown(verify=False)
if success:
succeeded += 1
else:
failed += 1
return succeeded, failed
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(
description="Shutdown iOS simulators with optional verification"
)
parser.add_argument(
"--udid",
help="Device UDID or name (required unless using --all or --type)",
)
parser.add_argument(
"--name",
help="Device name (alternative to --udid)",
)
parser.add_argument(
"--verify",
action="store_true",
help="Wait for shutdown to complete and verify state",
)
parser.add_argument(
"--timeout",
type=int,
default=30,
help="Timeout for --verify in seconds (default: 30)",
)
parser.add_argument(
"--all",
action="store_true",
help="Shutdown all booted simulators",
)
parser.add_argument(
"--type",
help="Shutdown all booted simulators of a specific type (e.g., iPhone)",
)
parser.add_argument(
"--json",
action="store_true",
help="Output as JSON",
)
args = parser.parse_args()
# Handle batch operations
if args.all:
succeeded, failed = SimulatorShutdown.shutdown_all()
if args.json:
import json
print(
json.dumps(
{
"action": "shutdown_all",
"succeeded": succeeded,
"failed": failed,
"total": succeeded + failed,
}
)
)
else:
total = succeeded + failed
print(f"Shutdown summary: {succeeded}/{total} succeeded, " f"{failed} failed")
sys.exit(0 if failed == 0 else 1)
if args.type:
succeeded, failed = SimulatorShutdown.shutdown_by_type(args.type)
if args.json:
import json
print(
json.dumps(
{
"action": "shutdown_by_type",
"type": args.type,
"succeeded": succeeded,
"failed": failed,
"total": succeeded + failed,
}
)
)
else:
total = succeeded + failed
print(
f"Shutdown {args.type} summary: {succeeded}/{total} succeeded, " f"{failed} failed"
)
sys.exit(0 if failed == 0 else 1)
# Resolve device identifier
device_id = args.udid or args.name
if not device_id:
print("Error: Specify --udid, --name, --all, or --type", file=sys.stderr)
sys.exit(1)
try:
udid = resolve_device_identifier(device_id)
except RuntimeError as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
# Shutdown device
shutdown = SimulatorShutdown(udid=udid)
success, message = shutdown.shutdown(verify=args.verify, timeout_seconds=args.timeout)
if args.json:
import json
print(
json.dumps(
{
"action": "shutdown",
"device_id": device_id,
"udid": udid,
"success": success,
"message": message,
}
)
)
else:
print(message)
sys.exit(0 if success else 1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,375 @@
#!/usr/bin/env python3
"""
Intelligent Simulator Selector
Suggests the best available iOS simulators based on:
- Recently used (from config)
- Latest iOS version
- Common models for testing
- Boot status
Usage Examples:
# Get suggestions for user selection
python scripts/simulator_selector.py --suggest
# List all available simulators
python scripts/simulator_selector.py --list
# Boot a specific simulator
python scripts/simulator_selector.py --boot "67A99DF0-27BD-4507-A3DE-B7D8C38F764A"
# Get suggestions as JSON for programmatic use
python scripts/simulator_selector.py --suggest --json
"""
import argparse
import json
import re
import subprocess
import sys
from datetime import datetime
from pathlib import Path
from typing import Optional
# Try to import config from build_and_test if available
try:
from xcode.config import Config
except ImportError:
Config = None
class SimulatorInfo:
"""Information about an iOS simulator."""
def __init__(
self,
name: str,
udid: str,
ios_version: str,
status: str,
):
"""Initialize simulator info."""
self.name = name
self.udid = udid
self.ios_version = ios_version
self.status = status
self.reasons: list[str] = []
def to_dict(self) -> dict:
"""Convert to dictionary."""
return {
"device": self.name,
"udid": self.udid,
"ios": self.ios_version,
"status": self.status,
"reasons": self.reasons,
}
class SimulatorSelector:
"""Intelligent simulator selection."""
# Common iPhone models ranked by testing priority
COMMON_MODELS = [
"iPhone 16 Pro",
"iPhone 16",
"iPhone 15 Pro",
"iPhone 15",
"iPhone SE (3rd generation)",
]
def __init__(self):
"""Initialize selector."""
self.simulators: list[SimulatorInfo] = []
self.config: dict | None = None
self.last_used_simulator: str | None = None
# Load config if available
if Config:
try:
config = Config.load()
self.last_used_simulator = config.get_preferred_simulator()
except Exception:
pass
def list_simulators(self) -> list[SimulatorInfo]:
"""
List all available simulators.
Returns:
List of SimulatorInfo objects
"""
try:
result = subprocess.run(
["xcrun", "simctl", "list", "devices", "--json"],
capture_output=True,
text=True,
check=True,
)
data = json.loads(result.stdout)
simulators = []
# Parse devices by iOS version
for runtime, devices in data.get("devices", {}).items():
# Extract iOS version from runtime (e.g., "com.apple.CoreSimulator.SimRuntime.iOS-18-0")
ios_version_match = re.search(r"iOS-(\d+-\d+)", runtime)
if not ios_version_match:
continue
ios_version = ios_version_match.group(1).replace("-", ".")
for device in devices:
name = device.get("name", "")
udid = device.get("udid", "")
is_available = device.get("isAvailable", False)
if not is_available or "iPhone" not in name:
continue
status = device.get("state", "").capitalize()
sim_info = SimulatorInfo(name, udid, ios_version, status)
simulators.append(sim_info)
self.simulators = simulators
return simulators
except subprocess.CalledProcessError as e:
print(f"Error listing simulators: {e.stderr}", file=sys.stderr)
return []
except json.JSONDecodeError as e:
print(f"Error parsing simulator list: {e}", file=sys.stderr)
return []
def get_suggestions(self, count: int = 4) -> list[SimulatorInfo]:
"""
Get top N suggested simulators.
Ranking factors:
1. Recently used (from config)
2. Latest iOS version
3. Common models
4. Boot status (Booted preferred)
Args:
count: Number of suggestions to return
Returns:
List of suggested SimulatorInfo objects
"""
if not self.simulators:
return []
# Score each simulator
scored = []
for sim in self.simulators:
score = self._score_simulator(sim)
scored.append((score, sim))
# Sort by score (descending)
scored.sort(key=lambda x: x[0], reverse=True)
# Return top N
suggestions = [sim for _, sim in scored[:count]]
# Add reasons to each suggestion
for i, sim in enumerate(suggestions, 1):
if i == 1:
sim.reasons.append("Recommended")
# Check if recently used
if self.last_used_simulator and self.last_used_simulator == sim.name:
sim.reasons.append("Recently used")
# Check if latest iOS
latest_ios = max(s.ios_version for s in self.simulators)
if sim.ios_version == latest_ios:
sim.reasons.append("Latest iOS")
# Check if common model
for j, model in enumerate(self.COMMON_MODELS):
if model in sim.name:
sim.reasons.append(f"#{j+1} common model")
break
# Check if booted
if sim.status == "Booted":
sim.reasons.append("Currently running")
return suggestions
def _score_simulator(self, sim: SimulatorInfo) -> float:
"""
Score a simulator for ranking.
Higher score = better recommendation.
Args:
sim: Simulator to score
Returns:
Score value
"""
score = 0.0
# Recently used gets highest priority (100 points)
if self.last_used_simulator and self.last_used_simulator == sim.name:
score += 100
# Latest iOS version (50 points)
latest_ios = max(s.ios_version for s in self.simulators)
if sim.ios_version == latest_ios:
score += 50
# Common models (30-20 points based on ranking)
for i, model in enumerate(self.COMMON_MODELS):
if model in sim.name:
score += 30 - (i * 2) # Higher ranking models get more points
break
# Currently booted (10 points)
if sim.status == "Booted":
score += 10
# iOS version number (minor factor for breaking ties)
ios_numeric = float(sim.ios_version.replace(".", ""))
score += ios_numeric * 0.1
return score
def boot_simulator(self, udid: str) -> bool:
"""
Boot a simulator.
Args:
udid: Simulator UDID
Returns:
True if successful, False otherwise
"""
try:
subprocess.run(
["xcrun", "simctl", "boot", udid],
capture_output=True,
check=True,
)
return True
except subprocess.CalledProcessError as e:
print(f"Error booting simulator: {e.stderr}", file=sys.stderr)
return False
def format_suggestions(suggestions: list[SimulatorInfo], json_format: bool = False) -> str:
"""
Format suggestions for output.
Args:
suggestions: List of suggestions
json_format: If True, output as JSON
Returns:
Formatted string
"""
if json_format:
data = {"suggestions": [s.to_dict() for s in suggestions]}
return json.dumps(data, indent=2)
if not suggestions:
return "No simulators available"
lines = ["Available Simulators:\n"]
for i, sim in enumerate(suggestions, 1):
lines.append(f"{i}. {sim.name} (iOS {sim.ios_version})")
if sim.reasons:
lines.append(f" {', '.join(sim.reasons)}")
lines.append(f" UDID: {sim.udid}")
lines.append("")
return "\n".join(lines)
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(
description="Intelligent iOS simulator selector",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Get suggestions for user selection
python scripts/simulator_selector.py --suggest
# List all available simulators
python scripts/simulator_selector.py --list
# Boot a specific simulator
python scripts/simulator_selector.py --boot <UDID>
# Get suggestions as JSON
python scripts/simulator_selector.py --suggest --json
""",
)
parser.add_argument(
"--suggest",
action="store_true",
help="Get top simulator suggestions",
)
parser.add_argument(
"--list",
action="store_true",
help="List all available simulators",
)
parser.add_argument(
"--boot",
metavar="UDID",
help="Boot specific simulator by UDID",
)
parser.add_argument(
"--json",
action="store_true",
help="Output as JSON",
)
parser.add_argument(
"--count",
type=int,
default=4,
help="Number of suggestions (default: 4)",
)
args = parser.parse_args()
selector = SimulatorSelector()
if args.boot:
# Boot specific simulator
success = selector.boot_simulator(args.boot)
if success:
print(f"Booted simulator: {args.boot}")
return 0
return 1
if args.list:
# List all simulators
simulators = selector.list_simulators()
output = format_suggestions(simulators, args.json)
print(output)
return 0
if args.suggest:
# Get suggestions
selector.list_simulators()
suggestions = selector.get_suggestions(args.count)
output = format_suggestions(suggestions, args.json)
print(output)
return 0
# Default: show suggestions
selector.list_simulators()
suggestions = selector.get_suggestions(args.count)
output = format_suggestions(suggestions, args.json)
print(output)
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@@ -0,0 +1,250 @@
#!/usr/bin/env python3
"""
iOS Status Bar Controller
Override simulator status bar for clean screenshots and testing.
Control time, network, wifi, battery display.
Usage: python scripts/status_bar.py --preset clean
"""
import argparse
import subprocess
import sys
from common import resolve_udid
class StatusBarController:
"""Controls iOS simulator status bar appearance."""
# Preset configurations
PRESETS = {
"clean": {
"time": "9:41",
"data_network": "5g",
"wifi_mode": "active",
"battery_state": "charged",
"battery_level": 100,
},
"testing": {
"time": "11:11",
"data_network": "4g",
"wifi_mode": "active",
"battery_state": "discharging",
"battery_level": 50,
},
"low_battery": {
"time": "9:41",
"data_network": "5g",
"wifi_mode": "active",
"battery_state": "discharging",
"battery_level": 20,
},
"airplane": {
"time": "9:41",
"data_network": "none",
"wifi_mode": "failed",
"battery_state": "charged",
"battery_level": 100,
},
}
def __init__(self, udid: str | None = None):
"""Initialize status bar controller.
Args:
udid: Optional device UDID (auto-detects booted simulator if None)
"""
self.udid = udid
def override(
self,
time: str | None = None,
data_network: str | None = None,
wifi_mode: str | None = None,
battery_state: str | None = None,
battery_level: int | None = None,
) -> bool:
"""
Override status bar appearance.
Args:
time: Time in HH:MM format (e.g., "9:41")
data_network: Network type (none, 1x, 3g, 4g, 5g, lte, lte-a)
wifi_mode: WiFi state (active, searching, failed)
battery_state: Battery state (charging, charged, discharging)
battery_level: Battery percentage (0-100)
Returns:
Success status
"""
cmd = ["xcrun", "simctl", "status_bar"]
if self.udid:
cmd.append(self.udid)
else:
cmd.append("booted")
cmd.append("override")
# Add parameters if provided
if time:
cmd.extend(["--time", time])
if data_network:
cmd.extend(["--dataNetwork", data_network])
if wifi_mode:
cmd.extend(["--wifiMode", wifi_mode])
if battery_state:
cmd.extend(["--batteryState", battery_state])
if battery_level is not None:
cmd.extend(["--batteryLevel", str(battery_level)])
try:
subprocess.run(cmd, capture_output=True, check=True)
return True
except subprocess.CalledProcessError:
return False
def clear(self) -> bool:
"""
Clear status bar override and restore defaults.
Returns:
Success status
"""
cmd = ["xcrun", "simctl", "status_bar"]
if self.udid:
cmd.append(self.udid)
else:
cmd.append("booted")
cmd.append("clear")
try:
subprocess.run(cmd, capture_output=True, check=True)
return True
except subprocess.CalledProcessError:
return False
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(
description="Override iOS simulator status bar for screenshots and testing"
)
# Preset option
parser.add_argument(
"--preset",
choices=list(StatusBarController.PRESETS.keys()),
help="Use preset configuration (clean, testing, low-battery, airplane)",
)
# Custom options
parser.add_argument(
"--time",
help="Override time (HH:MM format, e.g., '9:41')",
)
parser.add_argument(
"--data-network",
choices=["none", "1x", "3g", "4g", "5g", "lte", "lte-a"],
help="Data network type",
)
parser.add_argument(
"--wifi-mode",
choices=["active", "searching", "failed"],
help="WiFi state",
)
parser.add_argument(
"--battery-state",
choices=["charging", "charged", "discharging"],
help="Battery state",
)
parser.add_argument(
"--battery-level",
type=int,
help="Battery level 0-100",
)
# Other options
parser.add_argument(
"--clear",
action="store_true",
help="Clear status bar override and restore defaults",
)
parser.add_argument(
"--udid",
help="Device UDID (auto-detects booted simulator if not provided)",
)
args = parser.parse_args()
# Resolve UDID with auto-detection
try:
udid = resolve_udid(args.udid)
except RuntimeError as e:
print(f"Error: {e}")
sys.exit(1)
controller = StatusBarController(udid=udid)
# Clear mode
if args.clear:
if controller.clear():
print("Status bar override cleared - defaults restored")
else:
print("Failed to clear status bar override")
sys.exit(1)
# Preset mode
elif args.preset:
preset = StatusBarController.PRESETS[args.preset]
if controller.override(**preset):
print(f"Status bar: {args.preset} preset applied")
print(
f" Time: {preset['time']}, "
f"Network: {preset['data_network']}, "
f"Battery: {preset['battery_level']}%"
)
else:
print(f"Failed to apply {args.preset} preset")
sys.exit(1)
# Custom mode
elif any(
[
args.time,
args.data_network,
args.wifi_mode,
args.battery_state,
args.battery_level is not None,
]
):
if controller.override(
time=args.time,
data_network=args.data_network,
wifi_mode=args.wifi_mode,
battery_state=args.battery_state,
battery_level=args.battery_level,
):
output = "Status bar override applied:"
if args.time:
output += f" Time={args.time}"
if args.data_network:
output += f" Network={args.data_network}"
if args.battery_level is not None:
output += f" Battery={args.battery_level}%"
print(output)
else:
print("Failed to override status bar")
sys.exit(1)
else:
parser.print_help()
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,323 @@
#!/usr/bin/env python3
"""
Test Recorder for iOS Simulator Testing
Records test execution with automatic screenshots and documentation.
Optimized for minimal token output during execution.
Usage:
As a script: python scripts/test_recorder.py --test-name "Test Name" --output dir/
As a module: from scripts.test_recorder import TestRecorder
"""
import argparse
import json
import subprocess
import time
from datetime import datetime
from pathlib import Path
from common import (
capture_screenshot,
count_elements,
generate_screenshot_name,
get_accessibility_tree,
resolve_udid,
)
class TestRecorder:
"""Records test execution with screenshots and accessibility snapshots."""
def __init__(
self,
test_name: str,
output_dir: str = "test-artifacts",
udid: str | None = None,
inline: bool = False,
screenshot_size: str = "half",
app_name: str | None = None,
):
"""
Initialize test recorder.
Args:
test_name: Name of the test being recorded
output_dir: Directory for test artifacts
udid: Optional device UDID (uses booted if not specified)
inline: If True, return screenshots as base64 (for vision-based automation)
screenshot_size: 'full', 'half', 'quarter', 'thumb' (default: 'half')
app_name: App name for semantic screenshot naming
"""
self.test_name = test_name
self.udid = udid
self.inline = inline
self.screenshot_size = screenshot_size
self.app_name = app_name
self.start_time = time.time()
self.steps: list[dict] = []
self.current_step = 0
# Create timestamped output directory
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
safe_name = test_name.lower().replace(" ", "-")
self.output_dir = Path(output_dir) / f"{safe_name}-{timestamp}"
self.output_dir.mkdir(parents=True, exist_ok=True)
# Create subdirectories (only if not in inline mode)
if not inline:
self.screenshots_dir = self.output_dir / "screenshots"
self.screenshots_dir.mkdir(exist_ok=True)
else:
self.screenshots_dir = None
self.accessibility_dir = self.output_dir / "accessibility"
self.accessibility_dir.mkdir(exist_ok=True)
# Token-efficient output
mode_str = "(inline mode)" if inline else ""
print(f"Recording: {test_name} {mode_str}")
print(f"Output: {self.output_dir}/")
def step(
self,
description: str,
screen_name: str | None = None,
state: str | None = None,
assertion: str | None = None,
metadata: dict | None = None,
):
"""
Record a test step with automatic screenshot.
Args:
description: Step description
screen_name: Screen name for semantic naming
state: State description for semantic naming
assertion: Optional assertion to verify
metadata: Optional metadata for the step
"""
self.current_step += 1
step_time = time.time() - self.start_time
# Capture screenshot using new utility
screenshot_result = capture_screenshot(
self.udid,
size=self.screenshot_size,
inline=self.inline,
app_name=self.app_name,
screen_name=screen_name or description,
state=state,
)
# Capture accessibility tree
accessibility_path = (
self.accessibility_dir
/ f"{self.current_step:03d}-{description.lower().replace(' ', '-')[:20]}.json"
)
element_count = self._capture_accessibility(accessibility_path)
# Store step data
step_data = {
"number": self.current_step,
"description": description,
"timestamp": step_time,
"element_count": element_count,
"accessibility": accessibility_path.name,
"screenshot_mode": screenshot_result["mode"],
"screenshot_size": self.screenshot_size,
}
# Handle screenshot data based on mode
if screenshot_result["mode"] == "file":
step_data["screenshot"] = screenshot_result["file_path"]
step_data["screenshot_name"] = Path(screenshot_result["file_path"]).name
else:
# Inline mode
step_data["screenshot_base64"] = screenshot_result["base64_data"]
step_data["screenshot_dimensions"] = (
screenshot_result["width"],
screenshot_result["height"],
)
if assertion:
step_data["assertion"] = assertion
step_data["assertion_passed"] = True
if metadata:
step_data["metadata"] = metadata
self.steps.append(step_data)
# Token-efficient output (single line)
status = "" if not assertion or step_data.get("assertion_passed") else ""
screenshot_info = (
f" [{screenshot_result['width']}x{screenshot_result['height']}]" if self.inline else ""
)
print(
f"{status} Step {self.current_step}: {description} ({step_time:.1f}s){screenshot_info}"
)
def _capture_screenshot(self, output_path: Path) -> bool:
"""Capture screenshot using simctl."""
cmd = ["xcrun", "simctl", "io"]
if self.udid:
cmd.append(self.udid)
else:
cmd.append("booted")
cmd.extend(["screenshot", str(output_path)])
try:
subprocess.run(cmd, capture_output=True, check=True)
return True
except subprocess.CalledProcessError:
return False
def _capture_accessibility(self, output_path: Path) -> int:
"""Capture accessibility tree and return element count."""
try:
# Use shared utility to fetch tree
tree = get_accessibility_tree(self.udid, nested=True)
# Save tree
with open(output_path, "w") as f:
json.dump(tree, f, indent=2)
# Count elements using shared utility
return count_elements(tree)
except Exception:
return 0
def generate_report(self) -> dict[str, str]:
"""
Generate markdown test report.
Returns:
Dictionary with paths to generated files
"""
duration = time.time() - self.start_time
report_path = self.output_dir / "report.md"
# Generate markdown
with open(report_path, "w") as f:
f.write(f"# Test Report: {self.test_name}\n\n")
f.write(f"**Date:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
f.write(f"**Duration:** {duration:.1f} seconds\n")
f.write(f"**Steps:** {len(self.steps)}\n\n")
# Steps section
f.write("## Test Steps\n\n")
for step in self.steps:
f.write(
f"### Step {step['number']}: {step['description']} ({step['timestamp']:.1f}s)\n\n"
)
f.write(f"![Screenshot](screenshots/{step['screenshot']})\n\n")
if step.get("assertion"):
status = "" if step.get("assertion_passed") else ""
f.write(f"**Assertion:** {step['assertion']} {status}\n\n")
if step.get("metadata"):
f.write("**Metadata:**\n")
for key, value in step["metadata"].items():
f.write(f"- {key}: {value}\n")
f.write("\n")
f.write(f"**Accessibility Elements:** {step['element_count']}\n\n")
f.write("---\n\n")
# Summary
f.write("## Summary\n\n")
f.write(f"- Total steps: {len(self.steps)}\n")
f.write(f"- Duration: {duration:.1f}s\n")
f.write(f"- Screenshots: {len(self.steps)}\n")
f.write(f"- Accessibility snapshots: {len(self.steps)}\n")
# Save metadata JSON
metadata_path = self.output_dir / "metadata.json"
with open(metadata_path, "w") as f:
json.dump(
{
"test_name": self.test_name,
"duration": duration,
"steps": self.steps,
"timestamp": datetime.now().isoformat(),
},
f,
indent=2,
)
# Token-efficient output
print(f"Report: {report_path}")
return {
"markdown_path": str(report_path),
"metadata_path": str(metadata_path),
"output_dir": str(self.output_dir),
}
def main():
"""Main entry point for command-line usage."""
parser = argparse.ArgumentParser(
description="Record test execution with screenshots and documentation"
)
parser.add_argument("--test-name", required=True, help="Name of the test being recorded")
parser.add_argument(
"--output", default="test-artifacts", help="Output directory for test artifacts"
)
parser.add_argument(
"--udid",
help="Device UDID (auto-detects booted simulator if not provided)",
)
parser.add_argument(
"--inline",
action="store_true",
help="Return screenshots as base64 (inline mode for vision-based automation)",
)
parser.add_argument(
"--size",
choices=["full", "half", "quarter", "thumb"],
default="half",
help="Screenshot size for token optimization (default: half)",
)
parser.add_argument("--app-name", help="App name for semantic screenshot naming")
args = parser.parse_args()
# Resolve UDID with auto-detection
try:
udid = resolve_udid(args.udid)
except RuntimeError as e:
print(f"Error: {e}")
import sys
sys.exit(1)
# Create recorder
TestRecorder(
test_name=args.test_name,
output_dir=args.output,
udid=udid,
inline=args.inline,
screenshot_size=args.size,
app_name=args.app_name,
)
print("Test recorder initialized. Use the following methods:")
print(' recorder.step("description") - Record a test step')
print(" recorder.generate_report() - Generate final report")
print()
print("Example:")
print(' recorder.step("Launch app", screen_name="Splash")')
print(
' recorder.step("Enter credentials", screen_name="Login", state="Empty", metadata={"user": "test"})'
)
print(' recorder.step("Verify login", assertion="Home screen visible")')
print(" recorder.generate_report()")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,235 @@
#!/usr/bin/env python3
"""
Visual Diff Tool for iOS Simulator Screenshots
Compares two screenshots pixel-by-pixel to detect visual changes.
Optimized for minimal token output.
Usage: python scripts/visual_diff.py baseline.png current.png [options]
"""
import argparse
import json
import os
import sys
from pathlib import Path
try:
from PIL import Image, ImageChops, ImageDraw
except ImportError:
print("Error: Pillow not installed. Run: pip3 install pillow")
sys.exit(1)
class VisualDiffer:
"""Performs visual comparison between screenshots."""
def __init__(self, threshold: float = 0.01):
"""
Initialize differ with threshold.
Args:
threshold: Maximum acceptable difference ratio (0.01 = 1%)
"""
self.threshold = threshold
def compare(self, baseline_path: str, current_path: str) -> dict:
"""
Compare two images and return difference metrics.
Args:
baseline_path: Path to baseline image
current_path: Path to current image
Returns:
Dictionary with comparison results
"""
# Load images
try:
baseline = Image.open(baseline_path)
current = Image.open(current_path)
except FileNotFoundError as e:
print(f"Error: Image not found - {e}")
sys.exit(1)
except Exception as e:
print(f"Error: Failed to load image - {e}")
sys.exit(1)
# Verify dimensions match
if baseline.size != current.size:
return {
"error": "Image dimensions do not match",
"baseline_size": baseline.size,
"current_size": current.size,
}
# Convert to RGB if needed
if baseline.mode != "RGB":
baseline = baseline.convert("RGB")
if current.mode != "RGB":
current = current.convert("RGB")
# Calculate difference
diff = ImageChops.difference(baseline, current)
# Calculate metrics
total_pixels = baseline.size[0] * baseline.size[1]
diff_pixels = self._count_different_pixels(diff)
diff_percentage = (diff_pixels / total_pixels) * 100
# Determine pass/fail
passed = diff_percentage <= (self.threshold * 100)
return {
"dimensions": baseline.size,
"total_pixels": total_pixels,
"different_pixels": diff_pixels,
"difference_percentage": round(diff_percentage, 2),
"threshold_percentage": self.threshold * 100,
"passed": passed,
"verdict": "PASS" if passed else "FAIL",
}
def _count_different_pixels(self, diff_image: Image.Image) -> int:
"""Count number of pixels that are different."""
# Convert to grayscale for easier processing
diff_gray = diff_image.convert("L")
# Count non-zero pixels (different)
pixels = diff_gray.getdata()
return sum(1 for pixel in pixels if pixel > 10) # Threshold for noise
def generate_diff_image(self, baseline_path: str, current_path: str, output_path: str) -> None:
"""Generate highlighted difference image."""
baseline = Image.open(baseline_path).convert("RGB")
current = Image.open(current_path).convert("RGB")
# Create difference image
diff = ImageChops.difference(baseline, current)
# Enhance differences with red overlay
diff_enhanced = Image.new("RGB", baseline.size)
for x in range(baseline.size[0]):
for y in range(baseline.size[1]):
diff_pixel = diff.getpixel((x, y))
if sum(diff_pixel) > 30: # Threshold for visibility
# Highlight in red
diff_enhanced.putpixel((x, y), (255, 0, 0))
else:
# Keep original
diff_enhanced.putpixel((x, y), current.getpixel((x, y)))
diff_enhanced.save(output_path)
def generate_side_by_side(
self, baseline_path: str, current_path: str, output_path: str
) -> None:
"""Generate side-by-side comparison image."""
baseline = Image.open(baseline_path)
current = Image.open(current_path)
# Create combined image
width = baseline.size[0] * 2 + 10 # 10px separator
height = max(baseline.size[1], current.size[1])
combined = Image.new("RGB", (width, height), color=(128, 128, 128))
# Paste images
combined.paste(baseline, (0, 0))
combined.paste(current, (baseline.size[0] + 10, 0))
combined.save(output_path)
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description="Compare screenshots for visual differences")
parser.add_argument("baseline", help="Path to baseline screenshot")
parser.add_argument("current", help="Path to current screenshot")
parser.add_argument(
"--output",
default=".",
help="Output directory for diff artifacts (default: current directory)",
)
parser.add_argument(
"--threshold",
type=float,
default=0.01,
help="Acceptable difference threshold (0.01 = 1%%, default: 0.01)",
)
parser.add_argument(
"--details", action="store_true", help="Show detailed output (increases tokens)"
)
args = parser.parse_args()
# Create output directory if needed
output_dir = Path(args.output)
output_dir.mkdir(parents=True, exist_ok=True)
# Initialize differ
differ = VisualDiffer(threshold=args.threshold)
# Perform comparison
result = differ.compare(args.baseline, args.current)
# Handle dimension mismatch
if "error" in result:
print(f"Error: {result['error']}")
print(f"Baseline: {result['baseline_size']}")
print(f"Current: {result['current_size']}")
sys.exit(1)
# Generate artifacts
diff_image_path = output_dir / "diff.png"
comparison_image_path = output_dir / "side-by-side.png"
try:
differ.generate_diff_image(args.baseline, args.current, str(diff_image_path))
differ.generate_side_by_side(args.baseline, args.current, str(comparison_image_path))
except Exception as e:
print(f"Warning: Could not generate images - {e}")
# Output results (token-optimized)
if args.details:
# Detailed output
report = {
"summary": {
"baseline": args.baseline,
"current": args.current,
"threshold": args.threshold,
"passed": result["passed"],
},
"results": result,
"artifacts": {
"diff_image": str(diff_image_path),
"comparison_image": str(comparison_image_path),
},
}
print(json.dumps(report, indent=2))
else:
# Minimal output (default)
print(f"Difference: {result['difference_percentage']}% ({result['verdict']})")
if result["different_pixels"] > 0:
print(f"Changed pixels: {result['different_pixels']:,}")
print(f"Artifacts saved to: {output_dir}/")
# Save JSON report
report_path = output_dir / "diff-report.json"
with open(report_path, "w") as f:
json.dump(
{
"baseline": os.path.basename(args.baseline),
"current": os.path.basename(args.current),
"results": result,
"artifacts": {"diff": "diff.png", "comparison": "side-by-side.png"},
},
f,
indent=2,
)
# Exit with error if test failed
sys.exit(0 if result["passed"] else 1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,13 @@
"""
Xcode build automation module.
Provides structured, modular access to xcodebuild and xcresult functionality.
"""
from .builder import BuildRunner
from .cache import XCResultCache
from .config import Config
from .reporter import OutputFormatter
from .xcresult import XCResultParser
__all__ = ["BuildRunner", "Config", "OutputFormatter", "XCResultCache", "XCResultParser"]

View File

@@ -0,0 +1,381 @@
"""
Xcode build execution.
Handles xcodebuild command construction and execution with xcresult generation.
"""
import re
import subprocess
import sys
from pathlib import Path
from .cache import XCResultCache
from .config import Config
class BuildRunner:
"""
Execute xcodebuild commands with xcresult bundle generation.
Handles scheme auto-detection, command construction, and build/test execution.
"""
def __init__(
self,
project_path: str | None = None,
workspace_path: str | None = None,
scheme: str | None = None,
configuration: str = "Debug",
simulator: str | None = None,
cache: XCResultCache | None = None,
):
"""
Initialize build runner.
Args:
project_path: Path to .xcodeproj
workspace_path: Path to .xcworkspace
scheme: Build scheme (auto-detected if not provided)
configuration: Build configuration (Debug/Release)
simulator: Simulator name
cache: XCResult cache (creates default if not provided)
"""
self.project_path = project_path
self.workspace_path = workspace_path
self.scheme = scheme
self.configuration = configuration
self.simulator = simulator
self.cache = cache or XCResultCache()
def auto_detect_scheme(self) -> str | None:
"""
Auto-detect build scheme from project/workspace.
Returns:
Detected scheme name or None
"""
cmd = ["xcodebuild", "-list"]
if self.workspace_path:
cmd.extend(["-workspace", self.workspace_path])
elif self.project_path:
cmd.extend(["-project", self.project_path])
else:
return None
try:
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
# Parse schemes from output
in_schemes_section = False
for line in result.stdout.split("\n"):
line = line.strip()
if "Schemes:" in line:
in_schemes_section = True
continue
if in_schemes_section and line and not line.startswith("Build"):
# First scheme in list
return line
except subprocess.CalledProcessError as e:
print(f"Error auto-detecting scheme: {e}", file=sys.stderr)
return None
def get_simulator_destination(self) -> str:
"""
Get xcodebuild destination string.
Uses config preferences with fallback to auto-detection.
Priority:
1. --simulator CLI flag (self.simulator)
2. Config preferred_simulator
3. Config last_used_simulator
4. Auto-detect first iPhone
5. Generic iOS Simulator
Returns:
Destination string for -destination flag
"""
# Priority 1: CLI flag
if self.simulator:
return f"platform=iOS Simulator,name={self.simulator}"
# Priority 2-3: Config preferences
try:
# Determine project directory from project/workspace path
project_dir = None
if self.project_path:
project_dir = Path(self.project_path).parent
elif self.workspace_path:
project_dir = Path(self.workspace_path).parent
config = Config.load(project_dir=project_dir)
preferred = config.get_preferred_simulator()
if preferred:
# Check if preferred simulator exists
if self._simulator_exists(preferred):
return f"platform=iOS Simulator,name={preferred}"
print(f"Warning: Preferred simulator '{preferred}' not available", file=sys.stderr)
if config.should_fallback_to_any_iphone():
print("Falling back to auto-detection...", file=sys.stderr)
else:
# Strict mode: don't fallback
return f"platform=iOS Simulator,name={preferred}"
except Exception as e:
print(f"Warning: Could not load config: {e}", file=sys.stderr)
# Priority 4-5: Auto-detect
return self._auto_detect_simulator()
def _simulator_exists(self, name: str) -> bool:
"""
Check if simulator with given name exists and is available.
Args:
name: Simulator name (e.g., "iPhone 16 Pro")
Returns:
True if simulator exists and is available
"""
try:
result = subprocess.run(
["xcrun", "simctl", "list", "devices", "available", "iOS"],
capture_output=True,
text=True,
check=True,
)
# Check if simulator name appears in available devices
return any(name in line and "(" in line for line in result.stdout.split("\n"))
except subprocess.CalledProcessError:
return False
def _extract_simulator_name_from_destination(self, destination: str) -> str | None:
"""
Extract simulator name from destination string.
Args:
destination: Destination string (e.g., "platform=iOS Simulator,name=iPhone 16 Pro")
Returns:
Simulator name or None
"""
# Pattern: name=<simulator name>
match = re.search(r"name=([^,]+)", destination)
if match:
return match.group(1).strip()
return None
def _auto_detect_simulator(self) -> str:
"""
Auto-detect best available iOS simulator.
Returns:
Destination string for -destination flag
"""
try:
result = subprocess.run(
["xcrun", "simctl", "list", "devices", "available", "iOS"],
capture_output=True,
text=True,
check=True,
)
# Parse available simulators, prefer latest iPhone
# Looking for lines like: "iPhone 16 Pro (12345678-1234-1234-1234-123456789012) (Shutdown)"
for line in result.stdout.split("\n"):
if "iPhone" in line and "(" in line:
# Extract device name
name = line.split("(")[0].strip()
if name:
return f"platform=iOS Simulator,name={name}"
# Fallback to generic iOS Simulator if no iPhone found
return "generic/platform=iOS Simulator"
except subprocess.CalledProcessError as e:
print(f"Warning: Could not auto-detect simulator: {e}", file=sys.stderr)
return "generic/platform=iOS Simulator"
def build(self, clean: bool = False) -> tuple[bool, str, str]:
"""
Build the project.
Args:
clean: Perform clean build
Returns:
Tuple of (success: bool, xcresult_id: str, stderr: str)
"""
# Auto-detect scheme if needed
if not self.scheme:
self.scheme = self.auto_detect_scheme()
if not self.scheme:
print("Error: Could not auto-detect scheme. Use --scheme", file=sys.stderr)
return (False, "", "")
# Generate xcresult ID and path
xcresult_id = self.cache.generate_id()
xcresult_path = self.cache.get_path(xcresult_id)
# Build command
cmd = ["xcodebuild", "-quiet"] # Suppress verbose output
if clean:
cmd.append("clean")
cmd.append("build")
if self.workspace_path:
cmd.extend(["-workspace", self.workspace_path])
elif self.project_path:
cmd.extend(["-project", self.project_path])
else:
print("Error: No project or workspace specified", file=sys.stderr)
return (False, "", "")
cmd.extend(
[
"-scheme",
self.scheme,
"-configuration",
self.configuration,
"-destination",
self.get_simulator_destination(),
"-resultBundlePath",
str(xcresult_path),
]
)
# Execute build
try:
result = subprocess.run(
cmd, capture_output=True, text=True, check=False # Don't raise on non-zero exit
)
success = result.returncode == 0
# xcresult bundle should be created even on failure
if not xcresult_path.exists():
print("Warning: xcresult bundle was not created", file=sys.stderr)
return (success, "", result.stderr)
# Auto-update config with last used simulator (on success only)
if success:
try:
# Determine project directory from project/workspace path
project_dir = None
if self.project_path:
project_dir = Path(self.project_path).parent
elif self.workspace_path:
project_dir = Path(self.workspace_path).parent
config = Config.load(project_dir=project_dir)
destination = self.get_simulator_destination()
simulator_name = self._extract_simulator_name_from_destination(destination)
if simulator_name:
config.update_last_used_simulator(simulator_name)
config.save()
except Exception as e:
# Don't fail build if config update fails
print(f"Warning: Could not update config: {e}", file=sys.stderr)
return (success, xcresult_id, result.stderr)
except Exception as e:
print(f"Error executing build: {e}", file=sys.stderr)
return (False, "", str(e))
def test(self, test_suite: str | None = None) -> tuple[bool, str, str]:
"""
Run tests.
Args:
test_suite: Specific test suite to run
Returns:
Tuple of (success: bool, xcresult_id: str, stderr: str)
"""
# Auto-detect scheme if needed
if not self.scheme:
self.scheme = self.auto_detect_scheme()
if not self.scheme:
print("Error: Could not auto-detect scheme. Use --scheme", file=sys.stderr)
return (False, "", "")
# Generate xcresult ID and path
xcresult_id = self.cache.generate_id()
xcresult_path = self.cache.get_path(xcresult_id)
# Build command
cmd = ["xcodebuild", "-quiet", "test"]
if self.workspace_path:
cmd.extend(["-workspace", self.workspace_path])
elif self.project_path:
cmd.extend(["-project", self.project_path])
else:
print("Error: No project or workspace specified", file=sys.stderr)
return (False, "", "")
cmd.extend(
[
"-scheme",
self.scheme,
"-destination",
self.get_simulator_destination(),
"-resultBundlePath",
str(xcresult_path),
]
)
if test_suite:
cmd.extend(["-only-testing", test_suite])
# Execute tests
try:
result = subprocess.run(cmd, capture_output=True, text=True, check=False)
success = result.returncode == 0
# xcresult bundle should be created even on failure
if not xcresult_path.exists():
print("Warning: xcresult bundle was not created", file=sys.stderr)
return (success, "", result.stderr)
# Auto-update config with last used simulator (on success only)
if success:
try:
# Determine project directory from project/workspace path
project_dir = None
if self.project_path:
project_dir = Path(self.project_path).parent
elif self.workspace_path:
project_dir = Path(self.workspace_path).parent
config = Config.load(project_dir=project_dir)
destination = self.get_simulator_destination()
simulator_name = self._extract_simulator_name_from_destination(destination)
if simulator_name:
config.update_last_used_simulator(simulator_name)
config.save()
except Exception as e:
# Don't fail test if config update fails
print(f"Warning: Could not update config: {e}", file=sys.stderr)
return (success, xcresult_id, result.stderr)
except Exception as e:
print(f"Error executing tests: {e}", file=sys.stderr)
return (False, "", str(e))

View File

@@ -0,0 +1,204 @@
"""
XCResult cache management.
Handles storage, retrieval, and lifecycle of xcresult bundles for progressive disclosure.
"""
import shutil
from datetime import datetime
from pathlib import Path
class XCResultCache:
"""
Manage xcresult bundle cache for progressive disclosure.
Stores xcresult bundles with timestamp-based IDs and provides
retrieval and cleanup operations.
"""
# Default cache directory
DEFAULT_CACHE_DIR = Path.home() / ".ios-simulator-skill" / "xcresults"
def __init__(self, cache_dir: Path | None = None):
"""
Initialize cache manager.
Args:
cache_dir: Custom cache directory (uses default if not specified)
"""
self.cache_dir = cache_dir or self.DEFAULT_CACHE_DIR
self.cache_dir.mkdir(parents=True, exist_ok=True)
def generate_id(self, prefix: str = "xcresult") -> str:
"""
Generate timestamped xcresult ID.
Args:
prefix: ID prefix (default: "xcresult")
Returns:
ID string like "xcresult-20251018-143052"
"""
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
return f"{prefix}-{timestamp}"
def get_path(self, xcresult_id: str) -> Path:
"""
Get full path for xcresult ID.
Args:
xcresult_id: XCResult ID
Returns:
Path to xcresult bundle
"""
# Handle both with and without .xcresult extension
if xcresult_id.endswith(".xcresult"):
return self.cache_dir / xcresult_id
return self.cache_dir / f"{xcresult_id}.xcresult"
def exists(self, xcresult_id: str) -> bool:
"""
Check if xcresult bundle exists.
Args:
xcresult_id: XCResult ID
Returns:
True if bundle exists
"""
return self.get_path(xcresult_id).exists()
def save(self, source_path: Path, xcresult_id: str | None = None) -> str:
"""
Save xcresult bundle to cache.
Args:
source_path: Source xcresult bundle path
xcresult_id: Optional custom ID (generates if not provided)
Returns:
xcresult ID
"""
if not source_path.exists():
raise FileNotFoundError(f"Source xcresult not found: {source_path}")
# Generate ID if not provided
if not xcresult_id:
xcresult_id = self.generate_id()
# Get destination path
dest_path = self.get_path(xcresult_id)
# Copy xcresult bundle (it's a directory)
if dest_path.exists():
shutil.rmtree(dest_path)
shutil.copytree(source_path, dest_path)
return xcresult_id
def list(self, limit: int = 10) -> list[dict]:
"""
List recent xcresult bundles.
Args:
limit: Maximum number to return
Returns:
List of xcresult metadata dicts
"""
if not self.cache_dir.exists():
return []
results = []
for path in sorted(
self.cache_dir.glob("*.xcresult"), key=lambda p: p.stat().st_mtime, reverse=True
)[:limit]:
# Calculate bundle size
size_bytes = sum(f.stat().st_size for f in path.rglob("*") if f.is_file())
results.append(
{
"id": path.stem,
"path": str(path),
"created": datetime.fromtimestamp(path.stat().st_mtime).isoformat(),
"size_mb": round(size_bytes / (1024 * 1024), 2),
}
)
return results
def cleanup(self, keep_recent: int = 20) -> int:
"""
Clean up old xcresult bundles.
Args:
keep_recent: Number of recent bundles to keep
Returns:
Number of bundles removed
"""
if not self.cache_dir.exists():
return 0
# Get all bundles sorted by modification time
all_bundles = sorted(
self.cache_dir.glob("*.xcresult"), key=lambda p: p.stat().st_mtime, reverse=True
)
# Remove old bundles
removed = 0
for bundle_path in all_bundles[keep_recent:]:
shutil.rmtree(bundle_path)
removed += 1
return removed
def get_size_mb(self, xcresult_id: str) -> float:
"""
Get size of xcresult bundle in MB.
Args:
xcresult_id: XCResult ID
Returns:
Size in MB
"""
path = self.get_path(xcresult_id)
if not path.exists():
return 0.0
size_bytes = sum(f.stat().st_size for f in path.rglob("*") if f.is_file())
return round(size_bytes / (1024 * 1024), 2)
def save_stderr(self, xcresult_id: str, stderr: str) -> None:
"""
Save stderr output alongside xcresult bundle.
Args:
xcresult_id: XCResult ID
stderr: stderr output from xcodebuild
"""
if not stderr:
return
stderr_path = self.cache_dir / f"{xcresult_id}.stderr"
stderr_path.write_text(stderr, encoding="utf-8")
def get_stderr(self, xcresult_id: str) -> str:
"""
Retrieve cached stderr output.
Args:
xcresult_id: XCResult ID
Returns:
stderr content or empty string if not found
"""
stderr_path = self.cache_dir / f"{xcresult_id}.stderr"
if not stderr_path.exists():
return ""
return stderr_path.read_text(encoding="utf-8")

View File

@@ -0,0 +1,178 @@
"""
Configuration management for iOS Simulator Skill.
Handles loading, validation, and auto-updating of project-local config files.
"""
import json
import sys
from datetime import datetime
from pathlib import Path
from typing import Any
class Config:
"""
Project-local configuration with auto-learning.
Config file location: .claude/skills/<skill-directory-name>/config.json
The skill directory name is auto-detected from the installation location,
so configs work regardless of what users name the skill directory.
Auto-updates last_used_simulator after successful builds.
"""
DEFAULT_CONFIG = {
"device": {
"preferred_simulator": None,
"preferred_os_version": None,
"fallback_to_any_iphone": True,
"last_used_simulator": None,
"last_used_at": None,
}
}
def __init__(self, data: dict[str, Any], config_path: Path):
"""
Initialize config.
Args:
data: Config data dict
config_path: Path to config file
"""
self.data = data
self.config_path = config_path
@staticmethod
def load(project_dir: Path | None = None) -> "Config":
"""
Load config from project directory.
Args:
project_dir: Project root (defaults to cwd)
Returns:
Config instance (creates default if not found)
Note:
The skill directory name is auto-detected from the installation location,
so configs work regardless of what users name the skill directory.
"""
if project_dir is None:
project_dir = Path.cwd()
# Auto-detect skill directory name from actual installation location
# This file is at: skill/scripts/xcode/config.py
# Navigate up to skill/ directory and use its name
skill_root = Path(__file__).parent.parent.parent # xcode/ -> scripts/ -> skill/
skill_name = skill_root.name
config_path = project_dir / ".claude" / "skills" / skill_name / "config.json"
# Load existing config
if config_path.exists():
try:
with open(config_path) as f:
data = json.load(f)
# Merge with defaults (in case new fields added)
merged = Config._merge_with_defaults(data)
return Config(merged, config_path)
except json.JSONDecodeError as e:
print(f"Warning: Invalid JSON in {config_path}: {e}", file=sys.stderr)
print("Using default config", file=sys.stderr)
return Config(Config.DEFAULT_CONFIG.copy(), config_path)
except Exception as e:
print(f"Warning: Could not load config: {e}", file=sys.stderr)
return Config(Config.DEFAULT_CONFIG.copy(), config_path)
# Return default config (will be created on first save)
return Config(Config.DEFAULT_CONFIG.copy(), config_path)
@staticmethod
def _merge_with_defaults(data: dict[str, Any]) -> dict[str, Any]:
"""
Merge user config with defaults.
Args:
data: User config data
Returns:
Merged config with all default fields
"""
merged = Config.DEFAULT_CONFIG.copy()
# Deep merge device section
if "device" in data:
merged["device"].update(data["device"])
return merged
def save(self) -> None:
"""
Save config to file atomically.
Uses temp file + rename for atomic writes.
Creates parent directories if needed.
"""
try:
# Create parent directories
self.config_path.parent.mkdir(parents=True, exist_ok=True)
# Atomic write: temp file + rename
temp_path = self.config_path.with_suffix(".tmp")
with open(temp_path, "w") as f:
json.dump(self.data, f, indent=2)
f.write("\n") # Trailing newline
# Atomic rename
temp_path.replace(self.config_path)
except Exception as e:
print(f"Warning: Could not save config: {e}", file=sys.stderr)
def update_last_used_simulator(self, name: str) -> None:
"""
Update last used simulator and timestamp.
Args:
name: Simulator name (e.g., "iPhone 16 Pro")
"""
self.data["device"]["last_used_simulator"] = name
self.data["device"]["last_used_at"] = datetime.utcnow().isoformat() + "Z"
def get_preferred_simulator(self) -> str | None:
"""
Get preferred simulator.
Returns:
Simulator name or None
Priority:
1. preferred_simulator (manual preference)
2. last_used_simulator (auto-learned)
3. None (use auto-detection)
"""
device = self.data.get("device", {})
# Manual preference takes priority
if device.get("preferred_simulator"):
return device["preferred_simulator"]
# Auto-learned preference
if device.get("last_used_simulator"):
return device["last_used_simulator"]
return None
def should_fallback_to_any_iphone(self) -> bool:
"""
Check if fallback to any iPhone is enabled.
Returns:
True if should fallback, False otherwise
"""
return self.data.get("device", {}).get("fallback_to_any_iphone", True)

View File

@@ -0,0 +1,291 @@
"""
Build/test output formatting.
Provides multiple output formats with progressive disclosure support.
"""
import json
class OutputFormatter:
"""
Format build/test results for display.
Supports ultra-minimal default output, verbose mode, and JSON output.
"""
@staticmethod
def format_minimal(
status: str,
error_count: int,
warning_count: int,
xcresult_id: str,
test_info: dict | None = None,
hints: list[str] | None = None,
) -> str:
"""
Format ultra-minimal output (5-10 tokens).
Args:
status: Build status (SUCCESS/FAILED)
error_count: Number of errors
warning_count: Number of warnings
xcresult_id: XCResult bundle ID
test_info: Optional test results dict
hints: Optional list of actionable hints
Returns:
Minimal formatted string
Example:
Build: SUCCESS (0 errors, 3 warnings) [xcresult-20251018-143052]
Tests: PASS (12/12 passed, 4.2s) [xcresult-20251018-143052]
"""
lines = []
if test_info:
# Test mode
total = test_info.get("total", 0)
passed = test_info.get("passed", 0)
failed = test_info.get("failed", 0)
duration = test_info.get("duration", 0.0)
test_status = "PASS" if failed == 0 else "FAIL"
lines.append(
f"Tests: {test_status} ({passed}/{total} passed, {duration:.1f}s) [{xcresult_id}]"
)
else:
# Build mode
lines.append(
f"Build: {status} ({error_count} errors, {warning_count} warnings) [{xcresult_id}]"
)
# Add hints if provided and build failed
if hints and status == "FAILED":
lines.append("")
lines.extend(hints)
return "\n".join(lines)
@staticmethod
def format_errors(errors: list[dict], limit: int = 10) -> str:
"""
Format error details.
Args:
errors: List of error dicts
limit: Maximum errors to show
Returns:
Formatted error list
"""
if not errors:
return "No errors found."
lines = [f"Errors ({len(errors)}):"]
lines.append("")
for i, error in enumerate(errors[:limit], 1):
message = error.get("message", "Unknown error")
location = error.get("location", {})
# Format location
loc_parts = []
if location.get("file"):
file_path = location["file"].replace("file://", "")
loc_parts.append(file_path)
if location.get("line"):
loc_parts.append(f"line {location['line']}")
location_str = ":".join(loc_parts) if loc_parts else "unknown location"
lines.append(f"{i}. {message}")
lines.append(f" Location: {location_str}")
lines.append("")
if len(errors) > limit:
lines.append(f"... and {len(errors) - limit} more errors")
return "\n".join(lines)
@staticmethod
def format_warnings(warnings: list[dict], limit: int = 10) -> str:
"""
Format warning details.
Args:
warnings: List of warning dicts
limit: Maximum warnings to show
Returns:
Formatted warning list
"""
if not warnings:
return "No warnings found."
lines = [f"Warnings ({len(warnings)}):"]
lines.append("")
for i, warning in enumerate(warnings[:limit], 1):
message = warning.get("message", "Unknown warning")
location = warning.get("location", {})
# Format location
loc_parts = []
if location.get("file"):
file_path = location["file"].replace("file://", "")
loc_parts.append(file_path)
if location.get("line"):
loc_parts.append(f"line {location['line']}")
location_str = ":".join(loc_parts) if loc_parts else "unknown location"
lines.append(f"{i}. {message}")
lines.append(f" Location: {location_str}")
lines.append("")
if len(warnings) > limit:
lines.append(f"... and {len(warnings) - limit} more warnings")
return "\n".join(lines)
@staticmethod
def format_log(log: str, lines: int = 50) -> str:
"""
Format build log (show last N lines).
Args:
log: Full build log
lines: Number of lines to show
Returns:
Formatted log excerpt
"""
if not log:
return "No build log available."
log_lines = log.strip().split("\n")
if len(log_lines) <= lines:
return log
# Show last N lines
excerpt = log_lines[-lines:]
return f"... (showing last {lines} lines of {len(log_lines)})\n\n" + "\n".join(excerpt)
@staticmethod
def format_json(data: dict) -> str:
"""
Format data as JSON.
Args:
data: Data to format
Returns:
Pretty-printed JSON string
"""
return json.dumps(data, indent=2)
@staticmethod
def generate_hints(errors: list[dict]) -> list[str]:
"""
Generate actionable hints based on error types.
Args:
errors: List of error dicts
Returns:
List of hint strings
"""
hints = []
error_types: set[str] = set()
# Collect error types
for error in errors:
error_type = error.get("type", "unknown")
error_types.add(error_type)
# Generate hints based on error types
if "provisioning" in error_types:
hints.append("Provisioning profile issue detected:")
hints.append(" • Ensure you have a valid provisioning profile for iOS Simulator")
hints.append(
' • For simulator builds, use CODE_SIGN_IDENTITY="" CODE_SIGNING_REQUIRED=NO'
)
hints.append(" • Or specify simulator explicitly: --simulator 'iPhone 16 Pro'")
if "signing" in error_types:
hints.append("Code signing issue detected:")
hints.append(" • For simulator builds, code signing is not required")
hints.append(" • Ensure build settings target iOS Simulator, not physical device")
hints.append(" • Check destination: platform=iOS Simulator,name=<device>")
if not error_types or "build" in error_types:
# Generic hints when error type is unknown
if any("destination" in error.get("message", "").lower() for error in errors):
hints.append("Device selection issue detected:")
hints.append(" • List available simulators: xcrun simctl list devices available")
hints.append(" • Specify simulator: --simulator 'iPhone 16 Pro'")
return hints
@staticmethod
def format_verbose(
status: str,
error_count: int,
warning_count: int,
xcresult_id: str,
errors: list[dict] | None = None,
warnings: list[dict] | None = None,
test_info: dict | None = None,
) -> str:
"""
Format verbose output with error/warning details.
Args:
status: Build status
error_count: Error count
warning_count: Warning count
xcresult_id: XCResult ID
errors: Optional error list
warnings: Optional warning list
test_info: Optional test results
Returns:
Verbose formatted output
"""
lines = []
# Header
if test_info:
total = test_info.get("total", 0)
passed = test_info.get("passed", 0)
failed = test_info.get("failed", 0)
duration = test_info.get("duration", 0.0)
test_status = "PASS" if failed == 0 else "FAIL"
lines.append(f"Tests: {test_status}")
lines.append(f" Total: {total}")
lines.append(f" Passed: {passed}")
lines.append(f" Failed: {failed}")
lines.append(f" Duration: {duration:.1f}s")
else:
lines.append(f"Build: {status}")
lines.append(f"XCResult: {xcresult_id}")
lines.append("")
# Errors
if errors and len(errors) > 0:
lines.append(OutputFormatter.format_errors(errors, limit=5))
lines.append("")
# Warnings
if warnings and len(warnings) > 0:
lines.append(OutputFormatter.format_warnings(warnings, limit=5))
lines.append("")
# Summary
lines.append(f"Summary: {error_count} errors, {warning_count} warnings")
return "\n".join(lines)

View File

@@ -0,0 +1,404 @@
"""
XCResult bundle parser.
Extracts structured data from xcresult bundles using xcresulttool.
"""
import json
import re
import subprocess
import sys
from pathlib import Path
from typing import Any
class XCResultParser:
"""
Parse xcresult bundles to extract build/test data.
Uses xcresulttool to extract structured JSON data from Apple's
xcresult bundle format.
"""
def __init__(self, xcresult_path: Path, stderr: str = ""):
"""
Initialize parser.
Args:
xcresult_path: Path to xcresult bundle
stderr: Optional stderr output for fallback parsing
"""
self.xcresult_path = xcresult_path
self.stderr = stderr
if xcresult_path and not xcresult_path.exists():
raise FileNotFoundError(f"XCResult bundle not found: {xcresult_path}")
def get_build_results(self) -> dict | None:
"""
Get build results as JSON.
Returns:
Parsed JSON dict or None on error
"""
return self._run_xcresulttool(["get", "build-results"])
def get_test_results(self) -> dict | None:
"""
Get test results summary as JSON.
Returns:
Parsed JSON dict or None on error
"""
return self._run_xcresulttool(["get", "test-results", "summary"])
def get_build_log(self) -> str | None:
"""
Get build log as plain text.
Returns:
Build log string or None on error
"""
result = self._run_xcresulttool(["get", "log", "--type", "build"], parse_json=False)
return result if result else None
def count_issues(self) -> tuple[int, int]:
"""
Count errors and warnings from build results.
Returns:
Tuple of (error_count, warning_count)
"""
error_count = 0
warning_count = 0
build_results = self.get_build_results()
if build_results:
try:
# Try top-level errors/warnings first (newer xcresult format)
if "errors" in build_results and isinstance(build_results.get("errors"), list):
error_count = len(build_results["errors"])
if "warnings" in build_results and isinstance(build_results.get("warnings"), list):
warning_count = len(build_results["warnings"])
# If not found, try legacy format: actions[0].buildResult.issues
if error_count == 0 and warning_count == 0:
actions = build_results.get("actions", {}).get("_values", [])
if actions:
build_result = actions[0].get("buildResult", {})
issues = build_result.get("issues", {})
# Count errors
error_summaries = issues.get("errorSummaries", {}).get("_values", [])
error_count = len(error_summaries)
# Count warnings
warning_summaries = issues.get("warningSummaries", {}).get("_values", [])
warning_count = len(warning_summaries)
except (KeyError, IndexError, TypeError) as e:
print(f"Warning: Could not parse issue counts from xcresult: {e}", file=sys.stderr)
# If no errors found in xcresult but stderr available, count stderr errors
if error_count == 0 and self.stderr:
stderr_errors = self._parse_stderr_errors()
error_count = len(stderr_errors)
return (error_count, warning_count)
def get_errors(self) -> list[dict]:
"""
Get detailed error information.
Returns:
List of error dicts with message, file, line info
"""
build_results = self.get_build_results()
errors = []
# Try to get errors from xcresult
if build_results:
try:
# Try top-level errors first (newer xcresult format)
if "errors" in build_results and isinstance(build_results.get("errors"), list):
for error in build_results["errors"]:
errors.append(
{
"message": error.get("message", "Unknown error"),
"type": error.get("issueType", "error"),
"location": self._extract_location_from_url(error.get("sourceURL")),
}
)
# If not found, try legacy format: actions[0].buildResult.issues
if not errors:
actions = build_results.get("actions", {}).get("_values", [])
if actions:
build_result = actions[0].get("buildResult", {})
issues = build_result.get("issues", {})
error_summaries = issues.get("errorSummaries", {}).get("_values", [])
for error in error_summaries:
errors.append(
{
"message": error.get("message", {}).get(
"_value", "Unknown error"
),
"type": error.get("issueType", {}).get("_value", "error"),
"location": self._extract_location(error),
}
)
except (KeyError, IndexError, TypeError) as e:
print(f"Warning: Could not parse errors from xcresult: {e}", file=sys.stderr)
# If no errors found in xcresult but stderr available, parse stderr
if not errors and self.stderr:
errors = self._parse_stderr_errors()
return errors
def get_warnings(self) -> list[dict]:
"""
Get detailed warning information.
Returns:
List of warning dicts with message, file, line info
"""
build_results = self.get_build_results()
if not build_results:
return []
warnings = []
try:
# Try top-level warnings first (newer xcresult format)
if "warnings" in build_results and isinstance(build_results.get("warnings"), list):
for warning in build_results["warnings"]:
warnings.append(
{
"message": warning.get("message", "Unknown warning"),
"type": warning.get("issueType", "warning"),
"location": self._extract_location_from_url(warning.get("sourceURL")),
}
)
# If not found, try legacy format: actions[0].buildResult.issues
if not warnings:
actions = build_results.get("actions", {}).get("_values", [])
if not actions:
return []
build_result = actions[0].get("buildResult", {})
issues = build_result.get("issues", {})
warning_summaries = issues.get("warningSummaries", {}).get("_values", [])
for warning in warning_summaries:
warnings.append(
{
"message": warning.get("message", {}).get("_value", "Unknown warning"),
"type": warning.get("issueType", {}).get("_value", "warning"),
"location": self._extract_location(warning),
}
)
except (KeyError, IndexError, TypeError) as e:
print(f"Warning: Could not parse warnings: {e}", file=sys.stderr)
return warnings
def _extract_location(self, issue: dict) -> dict:
"""
Extract file location from issue.
Args:
issue: Issue dict from xcresult
Returns:
Location dict with file, line, column
"""
location = {"file": None, "line": None, "column": None}
try:
doc_location = issue.get("documentLocationInCreatingWorkspace", {})
location["file"] = doc_location.get("url", {}).get("_value")
location["line"] = doc_location.get("startingLineNumber", {}).get("_value")
location["column"] = doc_location.get("startingColumnNumber", {}).get("_value")
except (KeyError, TypeError):
pass
return location
def _extract_location_from_url(self, source_url: str | None) -> dict:
"""
Extract file location from sourceURL (newer xcresult format).
Args:
source_url: Source URL like "file:///path/to/file.swift#StartingLineNumber=134&..."
Returns:
Location dict with file, line, column
"""
location = {"file": None, "line": None, "column": None}
if not source_url:
return location
try:
# Split URL and fragment
if "#" in source_url:
file_part, fragment = source_url.split("#", 1)
# Extract file path
location["file"] = file_part.replace("file://", "")
# Parse fragment parameters
params = {}
for param in fragment.split("&"):
if "=" in param:
key, value = param.split("=", 1)
params[key] = value
# Extract line and column
location["line"] = (
int(params.get("StartingLineNumber", 0)) + 1
if "StartingLineNumber" in params
else None
)
location["column"] = (
int(params.get("StartingColumnNumber", 0)) + 1
if "StartingColumnNumber" in params
else None
)
else:
# No fragment, just file path
location["file"] = source_url.replace("file://", "")
except (ValueError, AttributeError):
pass
return location
def _run_xcresulttool(self, args: list[str], parse_json: bool = True) -> Any | None:
"""
Run xcresulttool command.
Args:
args: Command arguments (after 'xcresulttool')
parse_json: Whether to parse output as JSON
Returns:
Parsed JSON dict, plain text, or None on error
"""
if not self.xcresult_path:
return None
cmd = ["xcrun", "xcresulttool"] + args + ["--path", str(self.xcresult_path)]
try:
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
if parse_json:
return json.loads(result.stdout)
return result.stdout
except subprocess.CalledProcessError as e:
print(f"Error running xcresulttool: {e}", file=sys.stderr)
print(f"stderr: {e.stderr}", file=sys.stderr)
return None
except json.JSONDecodeError as e:
print(f"Error parsing JSON from xcresulttool: {e}", file=sys.stderr)
return None
def _parse_stderr_errors(self) -> list[dict]:
"""
Parse common errors from stderr output as fallback.
Returns:
List of error dicts parsed from stderr
"""
errors = []
if not self.stderr:
return errors
# Pattern 0: Swift/Clang compilation errors (e.g., "/path/file.swift:135:59: error: message")
compilation_error_pattern = (
r"^(?P<file>[^:]+):(?P<line>\d+):(?P<column>\d+):\s*error:\s*(?P<message>.+?)$"
)
for match in re.finditer(compilation_error_pattern, self.stderr, re.MULTILINE):
errors.append(
{
"message": match.group("message").strip(),
"type": "compilation",
"location": {
"file": match.group("file"),
"line": int(match.group("line")),
"column": int(match.group("column")),
},
}
)
# Pattern 1: xcodebuild top-level errors (e.g., "xcodebuild: error: Unable to find...")
xcodebuild_error_pattern = r"xcodebuild:\s*error:\s*(?P<message>.*?)(?:\n\n|\Z)"
for match in re.finditer(xcodebuild_error_pattern, self.stderr, re.DOTALL):
message = match.group("message").strip()
# Clean up multi-line messages
message = " ".join(line.strip() for line in message.split("\n") if line.strip())
errors.append(
{
"message": message,
"type": "build",
"location": {"file": None, "line": None, "column": None},
}
)
# Pattern 2: Provisioning profile errors
provisioning_pattern = r"error:.*?provisioning profile.*?(?:doesn't|does not|cannot).*?(?P<message>.*?)(?:\n|$)"
for match in re.finditer(provisioning_pattern, self.stderr, re.IGNORECASE):
errors.append(
{
"message": f"Provisioning profile error: {match.group('message').strip()}",
"type": "provisioning",
"location": {"file": None, "line": None, "column": None},
}
)
# Pattern 3: Code signing errors
signing_pattern = r"error:.*?(?:code sign|signing).*?(?P<message>.*?)(?:\n|$)"
for match in re.finditer(signing_pattern, self.stderr, re.IGNORECASE):
errors.append(
{
"message": f"Code signing error: {match.group('message').strip()}",
"type": "signing",
"location": {"file": None, "line": None, "column": None},
}
)
# Pattern 4: Generic compilation errors (but not if already captured)
if not errors:
generic_error_pattern = r"^(?:\*\*\s)?(?:error|❌):\s*(?P<message>.*?)(?:\n|$)"
for match in re.finditer(generic_error_pattern, self.stderr, re.MULTILINE):
message = match.group("message").strip()
errors.append(
{
"message": message,
"type": "build",
"location": {"file": None, "line": None, "column": None},
}
)
# Pattern 5: Specific "No profiles" error
if "No profiles for" in self.stderr:
no_profile_pattern = r"No profiles for '(?P<bundle_id>.*?)' were found"
for match in re.finditer(no_profile_pattern, self.stderr):
errors.append(
{
"message": f"No provisioning profile found for bundle ID '{match.group('bundle_id')}'",
"type": "provisioning",
"location": {"file": None, "line": None, "column": None},
}
)
return errors

View File

@@ -0,0 +1,30 @@
© 2025 Anthropic, PBC. All rights reserved.
LICENSE: Use of these materials (including all code, prompts, assets, files,
and other components of this Skill) is governed by your agreement with
Anthropic regarding use of Anthropic's services. If no separate agreement
exists, use is governed by Anthropic's Consumer Terms of Service or
Commercial Terms of Service, as applicable:
https://www.anthropic.com/legal/consumer-terms
https://www.anthropic.com/legal/commercial-terms
Your applicable agreement is referred to as the "Agreement." "Services" are
as defined in the Agreement.
ADDITIONAL RESTRICTIONS: Notwithstanding anything in the Agreement to the
contrary, users may not:
- Extract these materials from the Services or retain copies of these
materials outside the Services
- Reproduce or copy these materials, except for temporary copies created
automatically during authorized use of the Services
- Create derivative works based on these materials
- Distribute, sublicense, or transfer these materials to any third party
- Make, offer to sell, sell, or import any inventions embodied in these
materials
- Reverse engineer, decompile, or disassemble these materials
The receipt, viewing, or possession of these materials does not convey or
imply any license or right beyond those expressly granted above.
Anthropic retains all right, title, and interest in these materials,
including all copyrights, patents, and other intellectual property rights.

314
.agents/skills/pdf/SKILL.md Normal file
View File

@@ -0,0 +1,314 @@
---
name: pdf
description: Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill.
license: Proprietary. LICENSE.txt has complete terms
---
# PDF Processing Guide
## Overview
This guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see REFERENCE.md. If you need to fill out a PDF form, read FORMS.md and follow its instructions.
## Quick Start
```python
from pypdf import PdfReader, PdfWriter
# Read a PDF
reader = PdfReader("document.pdf")
print(f"Pages: {len(reader.pages)}")
# Extract text
text = ""
for page in reader.pages:
text += page.extract_text()
```
## Python Libraries
### pypdf - Basic Operations
#### Merge PDFs
```python
from pypdf import PdfWriter, PdfReader
writer = PdfWriter()
for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]:
reader = PdfReader(pdf_file)
for page in reader.pages:
writer.add_page(page)
with open("merged.pdf", "wb") as output:
writer.write(output)
```
#### Split PDF
```python
reader = PdfReader("input.pdf")
for i, page in enumerate(reader.pages):
writer = PdfWriter()
writer.add_page(page)
with open(f"page_{i+1}.pdf", "wb") as output:
writer.write(output)
```
#### Extract Metadata
```python
reader = PdfReader("document.pdf")
meta = reader.metadata
print(f"Title: {meta.title}")
print(f"Author: {meta.author}")
print(f"Subject: {meta.subject}")
print(f"Creator: {meta.creator}")
```
#### Rotate Pages
```python
reader = PdfReader("input.pdf")
writer = PdfWriter()
page = reader.pages[0]
page.rotate(90) # Rotate 90 degrees clockwise
writer.add_page(page)
with open("rotated.pdf", "wb") as output:
writer.write(output)
```
### pdfplumber - Text and Table Extraction
#### Extract Text with Layout
```python
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
for page in pdf.pages:
text = page.extract_text()
print(text)
```
#### Extract Tables
```python
with pdfplumber.open("document.pdf") as pdf:
for i, page in enumerate(pdf.pages):
tables = page.extract_tables()
for j, table in enumerate(tables):
print(f"Table {j+1} on page {i+1}:")
for row in table:
print(row)
```
#### Advanced Table Extraction
```python
import pandas as pd
with pdfplumber.open("document.pdf") as pdf:
all_tables = []
for page in pdf.pages:
tables = page.extract_tables()
for table in tables:
if table: # Check if table is not empty
df = pd.DataFrame(table[1:], columns=table[0])
all_tables.append(df)
# Combine all tables
if all_tables:
combined_df = pd.concat(all_tables, ignore_index=True)
combined_df.to_excel("extracted_tables.xlsx", index=False)
```
### reportlab - Create PDFs
#### Basic PDF Creation
```python
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
c = canvas.Canvas("hello.pdf", pagesize=letter)
width, height = letter
# Add text
c.drawString(100, height - 100, "Hello World!")
c.drawString(100, height - 120, "This is a PDF created with reportlab")
# Add a line
c.line(100, height - 140, 400, height - 140)
# Save
c.save()
```
#### Create PDF with Multiple Pages
```python
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak
from reportlab.lib.styles import getSampleStyleSheet
doc = SimpleDocTemplate("report.pdf", pagesize=letter)
styles = getSampleStyleSheet()
story = []
# Add content
title = Paragraph("Report Title", styles['Title'])
story.append(title)
story.append(Spacer(1, 12))
body = Paragraph("This is the body of the report. " * 20, styles['Normal'])
story.append(body)
story.append(PageBreak())
# Page 2
story.append(Paragraph("Page 2", styles['Heading1']))
story.append(Paragraph("Content for page 2", styles['Normal']))
# Build PDF
doc.build(story)
```
#### Subscripts and Superscripts
**IMPORTANT**: Never use Unicode subscript/superscript characters (₀₁₂₃₄₅₆₇₈₉, ⁰¹²³⁴⁵⁶⁷⁸⁹) in ReportLab PDFs. The built-in fonts do not include these glyphs, causing them to render as solid black boxes.
Instead, use ReportLab's XML markup tags in Paragraph objects:
```python
from reportlab.platypus import Paragraph
from reportlab.lib.styles import getSampleStyleSheet
styles = getSampleStyleSheet()
# Subscripts: use <sub> tag
chemical = Paragraph("H<sub>2</sub>O", styles['Normal'])
# Superscripts: use <super> tag
squared = Paragraph("x<super>2</super> + y<super>2</super>", styles['Normal'])
```
For canvas-drawn text (not Paragraph objects), manually adjust font the size and position rather than using Unicode subscripts/superscripts.
## Command-Line Tools
### pdftotext (poppler-utils)
```bash
# Extract text
pdftotext input.pdf output.txt
# Extract text preserving layout
pdftotext -layout input.pdf output.txt
# Extract specific pages
pdftotext -f 1 -l 5 input.pdf output.txt # Pages 1-5
```
### qpdf
```bash
# Merge PDFs
qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf
# Split pages
qpdf input.pdf --pages . 1-5 -- pages1-5.pdf
qpdf input.pdf --pages . 6-10 -- pages6-10.pdf
# Rotate pages
qpdf input.pdf output.pdf --rotate=+90:1 # Rotate page 1 by 90 degrees
# Remove password
qpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf
```
### pdftk (if available)
```bash
# Merge
pdftk file1.pdf file2.pdf cat output merged.pdf
# Split
pdftk input.pdf burst
# Rotate
pdftk input.pdf rotate 1east output rotated.pdf
```
## Common Tasks
### Extract Text from Scanned PDFs
```python
# Requires: pip install pytesseract pdf2image
import pytesseract
from pdf2image import convert_from_path
# Convert PDF to images
images = convert_from_path('scanned.pdf')
# OCR each page
text = ""
for i, image in enumerate(images):
text += f"Page {i+1}:\n"
text += pytesseract.image_to_string(image)
text += "\n\n"
print(text)
```
### Add Watermark
```python
from pypdf import PdfReader, PdfWriter
# Create watermark (or load existing)
watermark = PdfReader("watermark.pdf").pages[0]
# Apply to all pages
reader = PdfReader("document.pdf")
writer = PdfWriter()
for page in reader.pages:
page.merge_page(watermark)
writer.add_page(page)
with open("watermarked.pdf", "wb") as output:
writer.write(output)
```
### Extract Images
```bash
# Using pdfimages (poppler-utils)
pdfimages -j input.pdf output_prefix
# This extracts all images as output_prefix-000.jpg, output_prefix-001.jpg, etc.
```
### Password Protection
```python
from pypdf import PdfReader, PdfWriter
reader = PdfReader("input.pdf")
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
# Add password
writer.encrypt("userpassword", "ownerpassword")
with open("encrypted.pdf", "wb") as output:
writer.write(output)
```
## Quick Reference
| Task | Best Tool | Command/Code |
|------|-----------|--------------|
| Merge PDFs | pypdf | `writer.add_page(page)` |
| Split PDFs | pypdf | One page per file |
| Extract text | pdfplumber | `page.extract_text()` |
| Extract tables | pdfplumber | `page.extract_tables()` |
| Create PDFs | reportlab | Canvas or Platypus |
| Command line merge | qpdf | `qpdf --empty --pages ...` |
| OCR scanned PDFs | pytesseract | Convert to image first |
| Fill PDF forms | pdf-lib or pypdf (see FORMS.md) | See FORMS.md |
## Next Steps
- For advanced pypdfium2 usage, see REFERENCE.md
- For JavaScript libraries (pdf-lib), see REFERENCE.md
- If you need to fill out a PDF form, follow the instructions in FORMS.md
- For troubleshooting guides, see REFERENCE.md

View File

@@ -0,0 +1,5 @@
interface:
display_name: "PDF Skill"
short_description: "Create, edit, and review PDFs"
icon_large: "./assets/pdf.png"
default_prompt: "Create, edit, or review this PDF and summarize the key output or changes."

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 KiB

294
.agents/skills/pdf/forms.md Normal file
View File

@@ -0,0 +1,294 @@
**CRITICAL: You MUST complete these steps in order. Do not skip ahead to writing code.**
If you need to fill out a PDF form, first check to see if the PDF has fillable form fields. Run this script from this file's directory:
`python scripts/check_fillable_fields <file.pdf>`, and depending on the result go to either the "Fillable fields" or "Non-fillable fields" and follow those instructions.
# Fillable fields
If the PDF has fillable form fields:
- Run this script from this file's directory: `python scripts/extract_form_field_info.py <input.pdf> <field_info.json>`. It will create a JSON file with a list of fields in this format:
```
[
{
"field_id": (unique ID for the field),
"page": (page number, 1-based),
"rect": ([left, bottom, right, top] bounding box in PDF coordinates, y=0 is the bottom of the page),
"type": ("text", "checkbox", "radio_group", or "choice"),
},
// Checkboxes have "checked_value" and "unchecked_value" properties:
{
"field_id": (unique ID for the field),
"page": (page number, 1-based),
"type": "checkbox",
"checked_value": (Set the field to this value to check the checkbox),
"unchecked_value": (Set the field to this value to uncheck the checkbox),
},
// Radio groups have a "radio_options" list with the possible choices.
{
"field_id": (unique ID for the field),
"page": (page number, 1-based),
"type": "radio_group",
"radio_options": [
{
"value": (set the field to this value to select this radio option),
"rect": (bounding box for the radio button for this option)
},
// Other radio options
]
},
// Multiple choice fields have a "choice_options" list with the possible choices:
{
"field_id": (unique ID for the field),
"page": (page number, 1-based),
"type": "choice",
"choice_options": [
{
"value": (set the field to this value to select this option),
"text": (display text of the option)
},
// Other choice options
],
}
]
```
- Convert the PDF to PNGs (one image for each page) with this script (run from this file's directory):
`python scripts/convert_pdf_to_images.py <file.pdf> <output_directory>`
Then analyze the images to determine the purpose of each form field (make sure to convert the bounding box PDF coordinates to image coordinates).
- Create a `field_values.json` file in this format with the values to be entered for each field:
```
[
{
"field_id": "last_name", // Must match the field_id from `extract_form_field_info.py`
"description": "The user's last name",
"page": 1, // Must match the "page" value in field_info.json
"value": "Simpson"
},
{
"field_id": "Checkbox12",
"description": "Checkbox to be checked if the user is 18 or over",
"page": 1,
"value": "/On" // If this is a checkbox, use its "checked_value" value to check it. If it's a radio button group, use one of the "value" values in "radio_options".
},
// more fields
]
```
- Run the `fill_fillable_fields.py` script from this file's directory to create a filled-in PDF:
`python scripts/fill_fillable_fields.py <input pdf> <field_values.json> <output pdf>`
This script will verify that the field IDs and values you provide are valid; if it prints error messages, correct the appropriate fields and try again.
# Non-fillable fields
If the PDF doesn't have fillable form fields, you'll add text annotations. First try to extract coordinates from the PDF structure (more accurate), then fall back to visual estimation if needed.
## Step 1: Try Structure Extraction First
Run this script to extract text labels, lines, and checkboxes with their exact PDF coordinates:
`python scripts/extract_form_structure.py <input.pdf> form_structure.json`
This creates a JSON file containing:
- **labels**: Every text element with exact coordinates (x0, top, x1, bottom in PDF points)
- **lines**: Horizontal lines that define row boundaries
- **checkboxes**: Small square rectangles that are checkboxes (with center coordinates)
- **row_boundaries**: Row top/bottom positions calculated from horizontal lines
**Check the results**: If `form_structure.json` has meaningful labels (text elements that correspond to form fields), use **Approach A: Structure-Based Coordinates**. If the PDF is scanned/image-based and has few or no labels, use **Approach B: Visual Estimation**.
---
## Approach A: Structure-Based Coordinates (Preferred)
Use this when `extract_form_structure.py` found text labels in the PDF.
### A.1: Analyze the Structure
Read form_structure.json and identify:
1. **Label groups**: Adjacent text elements that form a single label (e.g., "Last" + "Name")
2. **Row structure**: Labels with similar `top` values are in the same row
3. **Field columns**: Entry areas start after label ends (x0 = label.x1 + gap)
4. **Checkboxes**: Use the checkbox coordinates directly from the structure
**Coordinate system**: PDF coordinates where y=0 is at TOP of page, y increases downward.
### A.2: Check for Missing Elements
The structure extraction may not detect all form elements. Common cases:
- **Circular checkboxes**: Only square rectangles are detected as checkboxes
- **Complex graphics**: Decorative elements or non-standard form controls
- **Faded or light-colored elements**: May not be extracted
If you see form fields in the PDF images that aren't in form_structure.json, you'll need to use **visual analysis** for those specific fields (see "Hybrid Approach" below).
### A.3: Create fields.json with PDF Coordinates
For each field, calculate entry coordinates from the extracted structure:
**Text fields:**
- entry x0 = label x1 + 5 (small gap after label)
- entry x1 = next label's x0, or row boundary
- entry top = same as label top
- entry bottom = row boundary line below, or label bottom + row_height
**Checkboxes:**
- Use the checkbox rectangle coordinates directly from form_structure.json
- entry_bounding_box = [checkbox.x0, checkbox.top, checkbox.x1, checkbox.bottom]
Create fields.json using `pdf_width` and `pdf_height` (signals PDF coordinates):
```json
{
"pages": [
{"page_number": 1, "pdf_width": 612, "pdf_height": 792}
],
"form_fields": [
{
"page_number": 1,
"description": "Last name entry field",
"field_label": "Last Name",
"label_bounding_box": [43, 63, 87, 73],
"entry_bounding_box": [92, 63, 260, 79],
"entry_text": {"text": "Smith", "font_size": 10}
},
{
"page_number": 1,
"description": "US Citizen Yes checkbox",
"field_label": "Yes",
"label_bounding_box": [260, 200, 280, 210],
"entry_bounding_box": [285, 197, 292, 205],
"entry_text": {"text": "X"}
}
]
}
```
**Important**: Use `pdf_width`/`pdf_height` and coordinates directly from form_structure.json.
### A.4: Validate Bounding Boxes
Before filling, check your bounding boxes for errors:
`python scripts/check_bounding_boxes.py fields.json`
This checks for intersecting bounding boxes and entry boxes that are too small for the font size. Fix any reported errors before filling.
---
## Approach B: Visual Estimation (Fallback)
Use this when the PDF is scanned/image-based and structure extraction found no usable text labels (e.g., all text shows as "(cid:X)" patterns).
### B.1: Convert PDF to Images
`python scripts/convert_pdf_to_images.py <input.pdf> <images_dir/>`
### B.2: Initial Field Identification
Examine each page image to identify form sections and get **rough estimates** of field locations:
- Form field labels and their approximate positions
- Entry areas (lines, boxes, or blank spaces for text input)
- Checkboxes and their approximate locations
For each field, note approximate pixel coordinates (they don't need to be precise yet).
### B.3: Zoom Refinement (CRITICAL for accuracy)
For each field, crop a region around the estimated position to refine coordinates precisely.
**Create a zoomed crop using ImageMagick:**
```bash
magick <page_image> -crop <width>x<height>+<x>+<y> +repage <crop_output.png>
```
Where:
- `<x>, <y>` = top-left corner of crop region (use your rough estimate minus padding)
- `<width>, <height>` = size of crop region (field area plus ~50px padding on each side)
**Example:** To refine a "Name" field estimated around (100, 150):
```bash
magick images_dir/page_1.png -crop 300x80+50+120 +repage crops/name_field.png
```
(Note: if the `magick` command isn't available, try `convert` with the same arguments).
**Examine the cropped image** to determine precise coordinates:
1. Identify the exact pixel where the entry area begins (after the label)
2. Identify where the entry area ends (before next field or edge)
3. Identify the top and bottom of the entry line/box
**Convert crop coordinates back to full image coordinates:**
- full_x = crop_x + crop_offset_x
- full_y = crop_y + crop_offset_y
Example: If the crop started at (50, 120) and the entry box starts at (52, 18) within the crop:
- entry_x0 = 52 + 50 = 102
- entry_top = 18 + 120 = 138
**Repeat for each field**, grouping nearby fields into single crops when possible.
### B.4: Create fields.json with Refined Coordinates
Create fields.json using `image_width` and `image_height` (signals image coordinates):
```json
{
"pages": [
{"page_number": 1, "image_width": 1700, "image_height": 2200}
],
"form_fields": [
{
"page_number": 1,
"description": "Last name entry field",
"field_label": "Last Name",
"label_bounding_box": [120, 175, 242, 198],
"entry_bounding_box": [255, 175, 720, 218],
"entry_text": {"text": "Smith", "font_size": 10}
}
]
}
```
**Important**: Use `image_width`/`image_height` and the refined pixel coordinates from the zoom analysis.
### B.5: Validate Bounding Boxes
Before filling, check your bounding boxes for errors:
`python scripts/check_bounding_boxes.py fields.json`
This checks for intersecting bounding boxes and entry boxes that are too small for the font size. Fix any reported errors before filling.
---
## Hybrid Approach: Structure + Visual
Use this when structure extraction works for most fields but misses some elements (e.g., circular checkboxes, unusual form controls).
1. **Use Approach A** for fields that were detected in form_structure.json
2. **Convert PDF to images** for visual analysis of missing fields
3. **Use zoom refinement** (from Approach B) for the missing fields
4. **Combine coordinates**: For fields from structure extraction, use `pdf_width`/`pdf_height`. For visually-estimated fields, you must convert image coordinates to PDF coordinates:
- pdf_x = image_x * (pdf_width / image_width)
- pdf_y = image_y * (pdf_height / image_height)
5. **Use a single coordinate system** in fields.json - convert all to PDF coordinates with `pdf_width`/`pdf_height`
---
## Step 2: Validate Before Filling
**Always validate bounding boxes before filling:**
`python scripts/check_bounding_boxes.py fields.json`
This checks for:
- Intersecting bounding boxes (which would cause overlapping text)
- Entry boxes that are too small for the specified font size
Fix any reported errors in fields.json before proceeding.
## Step 3: Fill the Form
The fill script auto-detects the coordinate system and handles conversion:
`python scripts/fill_pdf_form_with_annotations.py <input.pdf> fields.json <output.pdf>`
## Step 4: Verify Output
Convert the filled PDF to images and verify text placement:
`python scripts/convert_pdf_to_images.py <output.pdf> <verify_images/>`
If text is mispositioned:
- **Approach A**: Check that you're using PDF coordinates from form_structure.json with `pdf_width`/`pdf_height`
- **Approach B**: Check that image dimensions match and coordinates are accurate pixels
- **Hybrid**: Ensure coordinate conversions are correct for visually-estimated fields

View File

@@ -0,0 +1,612 @@
# PDF Processing Advanced Reference
This document contains advanced PDF processing features, detailed examples, and additional libraries not covered in the main skill instructions.
## pypdfium2 Library (Apache/BSD License)
### Overview
pypdfium2 is a Python binding for PDFium (Chromium's PDF library). It's excellent for fast PDF rendering, image generation, and serves as a PyMuPDF replacement.
### Render PDF to Images
```python
import pypdfium2 as pdfium
from PIL import Image
# Load PDF
pdf = pdfium.PdfDocument("document.pdf")
# Render page to image
page = pdf[0] # First page
bitmap = page.render(
scale=2.0, # Higher resolution
rotation=0 # No rotation
)
# Convert to PIL Image
img = bitmap.to_pil()
img.save("page_1.png", "PNG")
# Process multiple pages
for i, page in enumerate(pdf):
bitmap = page.render(scale=1.5)
img = bitmap.to_pil()
img.save(f"page_{i+1}.jpg", "JPEG", quality=90)
```
### Extract Text with pypdfium2
```python
import pypdfium2 as pdfium
pdf = pdfium.PdfDocument("document.pdf")
for i, page in enumerate(pdf):
text = page.get_text()
print(f"Page {i+1} text length: {len(text)} chars")
```
## JavaScript Libraries
### pdf-lib (MIT License)
pdf-lib is a powerful JavaScript library for creating and modifying PDF documents in any JavaScript environment.
#### Load and Manipulate Existing PDF
```javascript
import { PDFDocument } from 'pdf-lib';
import fs from 'fs';
async function manipulatePDF() {
// Load existing PDF
const existingPdfBytes = fs.readFileSync('input.pdf');
const pdfDoc = await PDFDocument.load(existingPdfBytes);
// Get page count
const pageCount = pdfDoc.getPageCount();
console.log(`Document has ${pageCount} pages`);
// Add new page
const newPage = pdfDoc.addPage([600, 400]);
newPage.drawText('Added by pdf-lib', {
x: 100,
y: 300,
size: 16
});
// Save modified PDF
const pdfBytes = await pdfDoc.save();
fs.writeFileSync('modified.pdf', pdfBytes);
}
```
#### Create Complex PDFs from Scratch
```javascript
import { PDFDocument, rgb, StandardFonts } from 'pdf-lib';
import fs from 'fs';
async function createPDF() {
const pdfDoc = await PDFDocument.create();
// Add fonts
const helveticaFont = await pdfDoc.embedFont(StandardFonts.Helvetica);
const helveticaBold = await pdfDoc.embedFont(StandardFonts.HelveticaBold);
// Add page
const page = pdfDoc.addPage([595, 842]); // A4 size
const { width, height } = page.getSize();
// Add text with styling
page.drawText('Invoice #12345', {
x: 50,
y: height - 50,
size: 18,
font: helveticaBold,
color: rgb(0.2, 0.2, 0.8)
});
// Add rectangle (header background)
page.drawRectangle({
x: 40,
y: height - 100,
width: width - 80,
height: 30,
color: rgb(0.9, 0.9, 0.9)
});
// Add table-like content
const items = [
['Item', 'Qty', 'Price', 'Total'],
['Widget', '2', '$50', '$100'],
['Gadget', '1', '$75', '$75']
];
let yPos = height - 150;
items.forEach(row => {
let xPos = 50;
row.forEach(cell => {
page.drawText(cell, {
x: xPos,
y: yPos,
size: 12,
font: helveticaFont
});
xPos += 120;
});
yPos -= 25;
});
const pdfBytes = await pdfDoc.save();
fs.writeFileSync('created.pdf', pdfBytes);
}
```
#### Advanced Merge and Split Operations
```javascript
import { PDFDocument } from 'pdf-lib';
import fs from 'fs';
async function mergePDFs() {
// Create new document
const mergedPdf = await PDFDocument.create();
// Load source PDFs
const pdf1Bytes = fs.readFileSync('doc1.pdf');
const pdf2Bytes = fs.readFileSync('doc2.pdf');
const pdf1 = await PDFDocument.load(pdf1Bytes);
const pdf2 = await PDFDocument.load(pdf2Bytes);
// Copy pages from first PDF
const pdf1Pages = await mergedPdf.copyPages(pdf1, pdf1.getPageIndices());
pdf1Pages.forEach(page => mergedPdf.addPage(page));
// Copy specific pages from second PDF (pages 0, 2, 4)
const pdf2Pages = await mergedPdf.copyPages(pdf2, [0, 2, 4]);
pdf2Pages.forEach(page => mergedPdf.addPage(page));
const mergedPdfBytes = await mergedPdf.save();
fs.writeFileSync('merged.pdf', mergedPdfBytes);
}
```
### pdfjs-dist (Apache License)
PDF.js is Mozilla's JavaScript library for rendering PDFs in the browser.
#### Basic PDF Loading and Rendering
```javascript
import * as pdfjsLib from 'pdfjs-dist';
// Configure worker (important for performance)
pdfjsLib.GlobalWorkerOptions.workerSrc = './pdf.worker.js';
async function renderPDF() {
// Load PDF
const loadingTask = pdfjsLib.getDocument('document.pdf');
const pdf = await loadingTask.promise;
console.log(`Loaded PDF with ${pdf.numPages} pages`);
// Get first page
const page = await pdf.getPage(1);
const viewport = page.getViewport({ scale: 1.5 });
// Render to canvas
const canvas = document.createElement('canvas');
const context = canvas.getContext('2d');
canvas.height = viewport.height;
canvas.width = viewport.width;
const renderContext = {
canvasContext: context,
viewport: viewport
};
await page.render(renderContext).promise;
document.body.appendChild(canvas);
}
```
#### Extract Text with Coordinates
```javascript
import * as pdfjsLib from 'pdfjs-dist';
async function extractText() {
const loadingTask = pdfjsLib.getDocument('document.pdf');
const pdf = await loadingTask.promise;
let fullText = '';
// Extract text from all pages
for (let i = 1; i <= pdf.numPages; i++) {
const page = await pdf.getPage(i);
const textContent = await page.getTextContent();
const pageText = textContent.items
.map(item => item.str)
.join(' ');
fullText += `\n--- Page ${i} ---\n${pageText}`;
// Get text with coordinates for advanced processing
const textWithCoords = textContent.items.map(item => ({
text: item.str,
x: item.transform[4],
y: item.transform[5],
width: item.width,
height: item.height
}));
}
console.log(fullText);
return fullText;
}
```
#### Extract Annotations and Forms
```javascript
import * as pdfjsLib from 'pdfjs-dist';
async function extractAnnotations() {
const loadingTask = pdfjsLib.getDocument('annotated.pdf');
const pdf = await loadingTask.promise;
for (let i = 1; i <= pdf.numPages; i++) {
const page = await pdf.getPage(i);
const annotations = await page.getAnnotations();
annotations.forEach(annotation => {
console.log(`Annotation type: ${annotation.subtype}`);
console.log(`Content: ${annotation.contents}`);
console.log(`Coordinates: ${JSON.stringify(annotation.rect)}`);
});
}
}
```
## Advanced Command-Line Operations
### poppler-utils Advanced Features
#### Extract Text with Bounding Box Coordinates
```bash
# Extract text with bounding box coordinates (essential for structured data)
pdftotext -bbox-layout document.pdf output.xml
# The XML output contains precise coordinates for each text element
```
#### Advanced Image Conversion
```bash
# Convert to PNG images with specific resolution
pdftoppm -png -r 300 document.pdf output_prefix
# Convert specific page range with high resolution
pdftoppm -png -r 600 -f 1 -l 3 document.pdf high_res_pages
# Convert to JPEG with quality setting
pdftoppm -jpeg -jpegopt quality=85 -r 200 document.pdf jpeg_output
```
#### Extract Embedded Images
```bash
# Extract all embedded images with metadata
pdfimages -j -p document.pdf page_images
# List image info without extracting
pdfimages -list document.pdf
# Extract images in their original format
pdfimages -all document.pdf images/img
```
### qpdf Advanced Features
#### Complex Page Manipulation
```bash
# Split PDF into groups of pages
qpdf --split-pages=3 input.pdf output_group_%02d.pdf
# Extract specific pages with complex ranges
qpdf input.pdf --pages input.pdf 1,3-5,8,10-end -- extracted.pdf
# Merge specific pages from multiple PDFs
qpdf --empty --pages doc1.pdf 1-3 doc2.pdf 5-7 doc3.pdf 2,4 -- combined.pdf
```
#### PDF Optimization and Repair
```bash
# Optimize PDF for web (linearize for streaming)
qpdf --linearize input.pdf optimized.pdf
# Remove unused objects and compress
qpdf --optimize-level=all input.pdf compressed.pdf
# Attempt to repair corrupted PDF structure
qpdf --check input.pdf
qpdf --fix-qdf damaged.pdf repaired.pdf
# Show detailed PDF structure for debugging
qpdf --show-all-pages input.pdf > structure.txt
```
#### Advanced Encryption
```bash
# Add password protection with specific permissions
qpdf --encrypt user_pass owner_pass 256 --print=none --modify=none -- input.pdf encrypted.pdf
# Check encryption status
qpdf --show-encryption encrypted.pdf
# Remove password protection (requires password)
qpdf --password=secret123 --decrypt encrypted.pdf decrypted.pdf
```
## Advanced Python Techniques
### pdfplumber Advanced Features
#### Extract Text with Precise Coordinates
```python
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
page = pdf.pages[0]
# Extract all text with coordinates
chars = page.chars
for char in chars[:10]: # First 10 characters
print(f"Char: '{char['text']}' at x:{char['x0']:.1f} y:{char['y0']:.1f}")
# Extract text by bounding box (left, top, right, bottom)
bbox_text = page.within_bbox((100, 100, 400, 200)).extract_text()
```
#### Advanced Table Extraction with Custom Settings
```python
import pdfplumber
import pandas as pd
with pdfplumber.open("complex_table.pdf") as pdf:
page = pdf.pages[0]
# Extract tables with custom settings for complex layouts
table_settings = {
"vertical_strategy": "lines",
"horizontal_strategy": "lines",
"snap_tolerance": 3,
"intersection_tolerance": 15
}
tables = page.extract_tables(table_settings)
# Visual debugging for table extraction
img = page.to_image(resolution=150)
img.save("debug_layout.png")
```
### reportlab Advanced Features
#### Create Professional Reports with Tables
```python
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib import colors
# Sample data
data = [
['Product', 'Q1', 'Q2', 'Q3', 'Q4'],
['Widgets', '120', '135', '142', '158'],
['Gadgets', '85', '92', '98', '105']
]
# Create PDF with table
doc = SimpleDocTemplate("report.pdf")
elements = []
# Add title
styles = getSampleStyleSheet()
title = Paragraph("Quarterly Sales Report", styles['Title'])
elements.append(title)
# Add table with advanced styling
table = Table(data)
table.setStyle(TableStyle([
('BACKGROUND', (0, 0), (-1, 0), colors.grey),
('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke),
('ALIGN', (0, 0), (-1, -1), 'CENTER'),
('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
('FONTSIZE', (0, 0), (-1, 0), 14),
('BOTTOMPADDING', (0, 0), (-1, 0), 12),
('BACKGROUND', (0, 1), (-1, -1), colors.beige),
('GRID', (0, 0), (-1, -1), 1, colors.black)
]))
elements.append(table)
doc.build(elements)
```
## Complex Workflows
### Extract Figures/Images from PDF
#### Method 1: Using pdfimages (fastest)
```bash
# Extract all images with original quality
pdfimages -all document.pdf images/img
```
#### Method 2: Using pypdfium2 + Image Processing
```python
import pypdfium2 as pdfium
from PIL import Image
import numpy as np
def extract_figures(pdf_path, output_dir):
pdf = pdfium.PdfDocument(pdf_path)
for page_num, page in enumerate(pdf):
# Render high-resolution page
bitmap = page.render(scale=3.0)
img = bitmap.to_pil()
# Convert to numpy for processing
img_array = np.array(img)
# Simple figure detection (non-white regions)
mask = np.any(img_array != [255, 255, 255], axis=2)
# Find contours and extract bounding boxes
# (This is simplified - real implementation would need more sophisticated detection)
# Save detected figures
# ... implementation depends on specific needs
```
### Batch PDF Processing with Error Handling
```python
import os
import glob
from pypdf import PdfReader, PdfWriter
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def batch_process_pdfs(input_dir, operation='merge'):
pdf_files = glob.glob(os.path.join(input_dir, "*.pdf"))
if operation == 'merge':
writer = PdfWriter()
for pdf_file in pdf_files:
try:
reader = PdfReader(pdf_file)
for page in reader.pages:
writer.add_page(page)
logger.info(f"Processed: {pdf_file}")
except Exception as e:
logger.error(f"Failed to process {pdf_file}: {e}")
continue
with open("batch_merged.pdf", "wb") as output:
writer.write(output)
elif operation == 'extract_text':
for pdf_file in pdf_files:
try:
reader = PdfReader(pdf_file)
text = ""
for page in reader.pages:
text += page.extract_text()
output_file = pdf_file.replace('.pdf', '.txt')
with open(output_file, 'w', encoding='utf-8') as f:
f.write(text)
logger.info(f"Extracted text from: {pdf_file}")
except Exception as e:
logger.error(f"Failed to extract text from {pdf_file}: {e}")
continue
```
### Advanced PDF Cropping
```python
from pypdf import PdfWriter, PdfReader
reader = PdfReader("input.pdf")
writer = PdfWriter()
# Crop page (left, bottom, right, top in points)
page = reader.pages[0]
page.mediabox.left = 50
page.mediabox.bottom = 50
page.mediabox.right = 550
page.mediabox.top = 750
writer.add_page(page)
with open("cropped.pdf", "wb") as output:
writer.write(output)
```
## Performance Optimization Tips
### 1. For Large PDFs
- Use streaming approaches instead of loading entire PDF in memory
- Use `qpdf --split-pages` for splitting large files
- Process pages individually with pypdfium2
### 2. For Text Extraction
- `pdftotext -bbox-layout` is fastest for plain text extraction
- Use pdfplumber for structured data and tables
- Avoid `pypdf.extract_text()` for very large documents
### 3. For Image Extraction
- `pdfimages` is much faster than rendering pages
- Use low resolution for previews, high resolution for final output
### 4. For Form Filling
- pdf-lib maintains form structure better than most alternatives
- Pre-validate form fields before processing
### 5. Memory Management
```python
# Process PDFs in chunks
def process_large_pdf(pdf_path, chunk_size=10):
reader = PdfReader(pdf_path)
total_pages = len(reader.pages)
for start_idx in range(0, total_pages, chunk_size):
end_idx = min(start_idx + chunk_size, total_pages)
writer = PdfWriter()
for i in range(start_idx, end_idx):
writer.add_page(reader.pages[i])
# Process chunk
with open(f"chunk_{start_idx//chunk_size}.pdf", "wb") as output:
writer.write(output)
```
## Troubleshooting Common Issues
### Encrypted PDFs
```python
# Handle password-protected PDFs
from pypdf import PdfReader
try:
reader = PdfReader("encrypted.pdf")
if reader.is_encrypted:
reader.decrypt("password")
except Exception as e:
print(f"Failed to decrypt: {e}")
```
### Corrupted PDFs
```bash
# Use qpdf to repair
qpdf --check corrupted.pdf
qpdf --replace-input corrupted.pdf
```
### Text Extraction Issues
```python
# Fallback to OCR for scanned PDFs
import pytesseract
from pdf2image import convert_from_path
def extract_text_with_ocr(pdf_path):
images = convert_from_path(pdf_path)
text = ""
for i, image in enumerate(images):
text += pytesseract.image_to_string(image)
return text
```
## License Information
- **pypdf**: BSD License
- **pdfplumber**: MIT License
- **pypdfium2**: Apache/BSD License
- **reportlab**: BSD License
- **poppler-utils**: GPL-2 License
- **qpdf**: Apache License
- **pdf-lib**: MIT License
- **pdfjs-dist**: Apache License

View File

@@ -0,0 +1,65 @@
from dataclasses import dataclass
import json
import sys
@dataclass
class RectAndField:
rect: list[float]
rect_type: str
field: dict
def get_bounding_box_messages(fields_json_stream) -> list[str]:
messages = []
fields = json.load(fields_json_stream)
messages.append(f"Read {len(fields['form_fields'])} fields")
def rects_intersect(r1, r2):
disjoint_horizontal = r1[0] >= r2[2] or r1[2] <= r2[0]
disjoint_vertical = r1[1] >= r2[3] or r1[3] <= r2[1]
return not (disjoint_horizontal or disjoint_vertical)
rects_and_fields = []
for f in fields["form_fields"]:
rects_and_fields.append(RectAndField(f["label_bounding_box"], "label", f))
rects_and_fields.append(RectAndField(f["entry_bounding_box"], "entry", f))
has_error = False
for i, ri in enumerate(rects_and_fields):
for j in range(i + 1, len(rects_and_fields)):
rj = rects_and_fields[j]
if ri.field["page_number"] == rj.field["page_number"] and rects_intersect(ri.rect, rj.rect):
has_error = True
if ri.field is rj.field:
messages.append(f"FAILURE: intersection between label and entry bounding boxes for `{ri.field['description']}` ({ri.rect}, {rj.rect})")
else:
messages.append(f"FAILURE: intersection between {ri.rect_type} bounding box for `{ri.field['description']}` ({ri.rect}) and {rj.rect_type} bounding box for `{rj.field['description']}` ({rj.rect})")
if len(messages) >= 20:
messages.append("Aborting further checks; fix bounding boxes and try again")
return messages
if ri.rect_type == "entry":
if "entry_text" in ri.field:
font_size = ri.field["entry_text"].get("font_size", 14)
entry_height = ri.rect[3] - ri.rect[1]
if entry_height < font_size:
has_error = True
messages.append(f"FAILURE: entry bounding box height ({entry_height}) for `{ri.field['description']}` is too short for the text content (font size: {font_size}). Increase the box height or decrease the font size.")
if len(messages) >= 20:
messages.append("Aborting further checks; fix bounding boxes and try again")
return messages
if not has_error:
messages.append("SUCCESS: All bounding boxes are valid")
return messages
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: check_bounding_boxes.py [fields.json]")
sys.exit(1)
with open(sys.argv[1]) as f:
messages = get_bounding_box_messages(f)
for msg in messages:
print(msg)

View File

@@ -0,0 +1,11 @@
import sys
from pypdf import PdfReader
reader = PdfReader(sys.argv[1])
if (reader.get_fields()):
print("This PDF has fillable form fields")
else:
print("This PDF does not have fillable form fields; you will need to visually determine where to enter data")

View File

@@ -0,0 +1,33 @@
import os
import sys
from pdf2image import convert_from_path
def convert(pdf_path, output_dir, max_dim=1000):
images = convert_from_path(pdf_path, dpi=200)
for i, image in enumerate(images):
width, height = image.size
if width > max_dim or height > max_dim:
scale_factor = min(max_dim / width, max_dim / height)
new_width = int(width * scale_factor)
new_height = int(height * scale_factor)
image = image.resize((new_width, new_height))
image_path = os.path.join(output_dir, f"page_{i+1}.png")
image.save(image_path)
print(f"Saved page {i+1} as {image_path} (size: {image.size})")
print(f"Converted {len(images)} pages to PNG images")
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: convert_pdf_to_images.py [input pdf] [output directory]")
sys.exit(1)
pdf_path = sys.argv[1]
output_directory = sys.argv[2]
convert(pdf_path, output_directory)

View File

@@ -0,0 +1,37 @@
import json
import sys
from PIL import Image, ImageDraw
def create_validation_image(page_number, fields_json_path, input_path, output_path):
with open(fields_json_path, 'r') as f:
data = json.load(f)
img = Image.open(input_path)
draw = ImageDraw.Draw(img)
num_boxes = 0
for field in data["form_fields"]:
if field["page_number"] == page_number:
entry_box = field['entry_bounding_box']
label_box = field['label_bounding_box']
draw.rectangle(entry_box, outline='red', width=2)
draw.rectangle(label_box, outline='blue', width=2)
num_boxes += 2
img.save(output_path)
print(f"Created validation image at {output_path} with {num_boxes} bounding boxes")
if __name__ == "__main__":
if len(sys.argv) != 5:
print("Usage: create_validation_image.py [page number] [fields.json file] [input image path] [output image path]")
sys.exit(1)
page_number = int(sys.argv[1])
fields_json_path = sys.argv[2]
input_image_path = sys.argv[3]
output_image_path = sys.argv[4]
create_validation_image(page_number, fields_json_path, input_image_path, output_image_path)

View File

@@ -0,0 +1,122 @@
import json
import sys
from pypdf import PdfReader
def get_full_annotation_field_id(annotation):
components = []
while annotation:
field_name = annotation.get('/T')
if field_name:
components.append(field_name)
annotation = annotation.get('/Parent')
return ".".join(reversed(components)) if components else None
def make_field_dict(field, field_id):
field_dict = {"field_id": field_id}
ft = field.get('/FT')
if ft == "/Tx":
field_dict["type"] = "text"
elif ft == "/Btn":
field_dict["type"] = "checkbox"
states = field.get("/_States_", [])
if len(states) == 2:
if "/Off" in states:
field_dict["checked_value"] = states[0] if states[0] != "/Off" else states[1]
field_dict["unchecked_value"] = "/Off"
else:
print(f"Unexpected state values for checkbox `${field_id}`. Its checked and unchecked values may not be correct; if you're trying to check it, visually verify the results.")
field_dict["checked_value"] = states[0]
field_dict["unchecked_value"] = states[1]
elif ft == "/Ch":
field_dict["type"] = "choice"
states = field.get("/_States_", [])
field_dict["choice_options"] = [{
"value": state[0],
"text": state[1],
} for state in states]
else:
field_dict["type"] = f"unknown ({ft})"
return field_dict
def get_field_info(reader: PdfReader):
fields = reader.get_fields()
field_info_by_id = {}
possible_radio_names = set()
for field_id, field in fields.items():
if field.get("/Kids"):
if field.get("/FT") == "/Btn":
possible_radio_names.add(field_id)
continue
field_info_by_id[field_id] = make_field_dict(field, field_id)
radio_fields_by_id = {}
for page_index, page in enumerate(reader.pages):
annotations = page.get('/Annots', [])
for ann in annotations:
field_id = get_full_annotation_field_id(ann)
if field_id in field_info_by_id:
field_info_by_id[field_id]["page"] = page_index + 1
field_info_by_id[field_id]["rect"] = ann.get('/Rect')
elif field_id in possible_radio_names:
try:
on_values = [v for v in ann["/AP"]["/N"] if v != "/Off"]
except KeyError:
continue
if len(on_values) == 1:
rect = ann.get("/Rect")
if field_id not in radio_fields_by_id:
radio_fields_by_id[field_id] = {
"field_id": field_id,
"type": "radio_group",
"page": page_index + 1,
"radio_options": [],
}
radio_fields_by_id[field_id]["radio_options"].append({
"value": on_values[0],
"rect": rect,
})
fields_with_location = []
for field_info in field_info_by_id.values():
if "page" in field_info:
fields_with_location.append(field_info)
else:
print(f"Unable to determine location for field id: {field_info.get('field_id')}, ignoring")
def sort_key(f):
if "radio_options" in f:
rect = f["radio_options"][0]["rect"] or [0, 0, 0, 0]
else:
rect = f.get("rect") or [0, 0, 0, 0]
adjusted_position = [-rect[1], rect[0]]
return [f.get("page"), adjusted_position]
sorted_fields = fields_with_location + list(radio_fields_by_id.values())
sorted_fields.sort(key=sort_key)
return sorted_fields
def write_field_info(pdf_path: str, json_output_path: str):
reader = PdfReader(pdf_path)
field_info = get_field_info(reader)
with open(json_output_path, "w") as f:
json.dump(field_info, f, indent=2)
print(f"Wrote {len(field_info)} fields to {json_output_path}")
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: extract_form_field_info.py [input pdf] [output json]")
sys.exit(1)
write_field_info(sys.argv[1], sys.argv[2])

View File

@@ -0,0 +1,115 @@
"""
Extract form structure from a non-fillable PDF.
This script analyzes the PDF to find:
- Text labels with their exact coordinates
- Horizontal lines (row boundaries)
- Checkboxes (small rectangles)
Output: A JSON file with the form structure that can be used to generate
accurate field coordinates for filling.
Usage: python extract_form_structure.py <input.pdf> <output.json>
"""
import json
import sys
import pdfplumber
def extract_form_structure(pdf_path):
structure = {
"pages": [],
"labels": [],
"lines": [],
"checkboxes": [],
"row_boundaries": []
}
with pdfplumber.open(pdf_path) as pdf:
for page_num, page in enumerate(pdf.pages, 1):
structure["pages"].append({
"page_number": page_num,
"width": float(page.width),
"height": float(page.height)
})
words = page.extract_words()
for word in words:
structure["labels"].append({
"page": page_num,
"text": word["text"],
"x0": round(float(word["x0"]), 1),
"top": round(float(word["top"]), 1),
"x1": round(float(word["x1"]), 1),
"bottom": round(float(word["bottom"]), 1)
})
for line in page.lines:
if abs(float(line["x1"]) - float(line["x0"])) > page.width * 0.5:
structure["lines"].append({
"page": page_num,
"y": round(float(line["top"]), 1),
"x0": round(float(line["x0"]), 1),
"x1": round(float(line["x1"]), 1)
})
for rect in page.rects:
width = float(rect["x1"]) - float(rect["x0"])
height = float(rect["bottom"]) - float(rect["top"])
if 5 <= width <= 15 and 5 <= height <= 15 and abs(width - height) < 2:
structure["checkboxes"].append({
"page": page_num,
"x0": round(float(rect["x0"]), 1),
"top": round(float(rect["top"]), 1),
"x1": round(float(rect["x1"]), 1),
"bottom": round(float(rect["bottom"]), 1),
"center_x": round((float(rect["x0"]) + float(rect["x1"])) / 2, 1),
"center_y": round((float(rect["top"]) + float(rect["bottom"])) / 2, 1)
})
lines_by_page = {}
for line in structure["lines"]:
page = line["page"]
if page not in lines_by_page:
lines_by_page[page] = []
lines_by_page[page].append(line["y"])
for page, y_coords in lines_by_page.items():
y_coords = sorted(set(y_coords))
for i in range(len(y_coords) - 1):
structure["row_boundaries"].append({
"page": page,
"row_top": y_coords[i],
"row_bottom": y_coords[i + 1],
"row_height": round(y_coords[i + 1] - y_coords[i], 1)
})
return structure
def main():
if len(sys.argv) != 3:
print("Usage: extract_form_structure.py <input.pdf> <output.json>")
sys.exit(1)
pdf_path = sys.argv[1]
output_path = sys.argv[2]
print(f"Extracting structure from {pdf_path}...")
structure = extract_form_structure(pdf_path)
with open(output_path, "w") as f:
json.dump(structure, f, indent=2)
print(f"Found:")
print(f" - {len(structure['pages'])} pages")
print(f" - {len(structure['labels'])} text labels")
print(f" - {len(structure['lines'])} horizontal lines")
print(f" - {len(structure['checkboxes'])} checkboxes")
print(f" - {len(structure['row_boundaries'])} row boundaries")
print(f"Saved to {output_path}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,98 @@
import json
import sys
from pypdf import PdfReader, PdfWriter
from extract_form_field_info import get_field_info
def fill_pdf_fields(input_pdf_path: str, fields_json_path: str, output_pdf_path: str):
with open(fields_json_path) as f:
fields = json.load(f)
fields_by_page = {}
for field in fields:
if "value" in field:
field_id = field["field_id"]
page = field["page"]
if page not in fields_by_page:
fields_by_page[page] = {}
fields_by_page[page][field_id] = field["value"]
reader = PdfReader(input_pdf_path)
has_error = False
field_info = get_field_info(reader)
fields_by_ids = {f["field_id"]: f for f in field_info}
for field in fields:
existing_field = fields_by_ids.get(field["field_id"])
if not existing_field:
has_error = True
print(f"ERROR: `{field['field_id']}` is not a valid field ID")
elif field["page"] != existing_field["page"]:
has_error = True
print(f"ERROR: Incorrect page number for `{field['field_id']}` (got {field['page']}, expected {existing_field['page']})")
else:
if "value" in field:
err = validation_error_for_field_value(existing_field, field["value"])
if err:
print(err)
has_error = True
if has_error:
sys.exit(1)
writer = PdfWriter(clone_from=reader)
for page, field_values in fields_by_page.items():
writer.update_page_form_field_values(writer.pages[page - 1], field_values, auto_regenerate=False)
writer.set_need_appearances_writer(True)
with open(output_pdf_path, "wb") as f:
writer.write(f)
def validation_error_for_field_value(field_info, field_value):
field_type = field_info["type"]
field_id = field_info["field_id"]
if field_type == "checkbox":
checked_val = field_info["checked_value"]
unchecked_val = field_info["unchecked_value"]
if field_value != checked_val and field_value != unchecked_val:
return f'ERROR: Invalid value "{field_value}" for checkbox field "{field_id}". The checked value is "{checked_val}" and the unchecked value is "{unchecked_val}"'
elif field_type == "radio_group":
option_values = [opt["value"] for opt in field_info["radio_options"]]
if field_value not in option_values:
return f'ERROR: Invalid value "{field_value}" for radio group field "{field_id}". Valid values are: {option_values}'
elif field_type == "choice":
choice_values = [opt["value"] for opt in field_info["choice_options"]]
if field_value not in choice_values:
return f'ERROR: Invalid value "{field_value}" for choice field "{field_id}". Valid values are: {choice_values}'
return None
def monkeypatch_pydpf_method():
from pypdf.generic import DictionaryObject
from pypdf.constants import FieldDictionaryAttributes
original_get_inherited = DictionaryObject.get_inherited
def patched_get_inherited(self, key: str, default = None):
result = original_get_inherited(self, key, default)
if key == FieldDictionaryAttributes.Opt:
if isinstance(result, list) and all(isinstance(v, list) and len(v) == 2 for v in result):
result = [r[0] for r in result]
return result
DictionaryObject.get_inherited = patched_get_inherited
if __name__ == "__main__":
if len(sys.argv) != 4:
print("Usage: fill_fillable_fields.py [input pdf] [field_values.json] [output pdf]")
sys.exit(1)
monkeypatch_pydpf_method()
input_pdf = sys.argv[1]
fields_json = sys.argv[2]
output_pdf = sys.argv[3]
fill_pdf_fields(input_pdf, fields_json, output_pdf)

View File

@@ -0,0 +1,107 @@
import json
import sys
from pypdf import PdfReader, PdfWriter
from pypdf.annotations import FreeText
def transform_from_image_coords(bbox, image_width, image_height, pdf_width, pdf_height):
x_scale = pdf_width / image_width
y_scale = pdf_height / image_height
left = bbox[0] * x_scale
right = bbox[2] * x_scale
top = pdf_height - (bbox[1] * y_scale)
bottom = pdf_height - (bbox[3] * y_scale)
return left, bottom, right, top
def transform_from_pdf_coords(bbox, pdf_height):
left = bbox[0]
right = bbox[2]
pypdf_top = pdf_height - bbox[1]
pypdf_bottom = pdf_height - bbox[3]
return left, pypdf_bottom, right, pypdf_top
def fill_pdf_form(input_pdf_path, fields_json_path, output_pdf_path):
with open(fields_json_path, "r") as f:
fields_data = json.load(f)
reader = PdfReader(input_pdf_path)
writer = PdfWriter()
writer.append(reader)
pdf_dimensions = {}
for i, page in enumerate(reader.pages):
mediabox = page.mediabox
pdf_dimensions[i + 1] = [mediabox.width, mediabox.height]
annotations = []
for field in fields_data["form_fields"]:
page_num = field["page_number"]
page_info = next(p for p in fields_data["pages"] if p["page_number"] == page_num)
pdf_width, pdf_height = pdf_dimensions[page_num]
if "pdf_width" in page_info:
transformed_entry_box = transform_from_pdf_coords(
field["entry_bounding_box"],
float(pdf_height)
)
else:
image_width = page_info["image_width"]
image_height = page_info["image_height"]
transformed_entry_box = transform_from_image_coords(
field["entry_bounding_box"],
image_width, image_height,
float(pdf_width), float(pdf_height)
)
if "entry_text" not in field or "text" not in field["entry_text"]:
continue
entry_text = field["entry_text"]
text = entry_text["text"]
if not text:
continue
font_name = entry_text.get("font", "Arial")
font_size = str(entry_text.get("font_size", 14)) + "pt"
font_color = entry_text.get("font_color", "000000")
annotation = FreeText(
text=text,
rect=transformed_entry_box,
font=font_name,
font_size=font_size,
font_color=font_color,
border_color=None,
background_color=None,
)
annotations.append(annotation)
writer.add_annotation(page_number=page_num - 1, annotation=annotation)
with open(output_pdf_path, "wb") as output:
writer.write(output)
print(f"Successfully filled PDF form and saved to {output_pdf_path}")
print(f"Added {len(annotations)} text annotations")
if __name__ == "__main__":
if len(sys.argv) != 4:
print("Usage: fill_pdf_form_with_annotations.py [input pdf] [fields.json] [output pdf]")
sys.exit(1)
input_pdf = sys.argv[1]
fields_json = sys.argv[2]
output_pdf = sys.argv[3]
fill_pdf_form(input_pdf, fields_json, output_pdf)

View File

@@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright (c) Microsoft Corporation.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -0,0 +1,14 @@
This skill includes material derived from the Microsoft playwright-cli repository.
Source:
- Repository: microsoft/playwright-cli
- Path: skills/playwright-cli/SKILL.md
Copyright (c) Microsoft Corporation.
Licensed under the Apache License, Version 2.0.
See LICENSE.txt in this directory.
Modifications:
- Adapted for the Codex skill collection.
- Added a wrapper script and local reference guides.

View File

@@ -0,0 +1,147 @@
---
name: "playwright"
description: "Use when the task requires automating a real browser from the terminal (navigation, form filling, snapshots, screenshots, data extraction, UI-flow debugging) via `playwright-cli` or the bundled wrapper script."
---
# Playwright CLI Skill
Drive a real browser from the terminal using `playwright-cli`. Prefer the bundled wrapper script so the CLI works even when it is not globally installed.
Treat this skill as CLI-first automation. Do not pivot to `@playwright/test` unless the user explicitly asks for test files.
## Prerequisite check (required)
Before proposing commands, check whether `npx` is available (the wrapper depends on it):
```bash
command -v npx >/dev/null 2>&1
```
If it is not available, pause and ask the user to install Node.js/npm (which provides `npx`). Provide these steps verbatim:
```bash
# Verify Node/npm are installed
node --version
npm --version
# If missing, install Node.js/npm, then:
npm install -g @playwright/cli@latest
playwright-cli --help
```
Once `npx` is present, proceed with the wrapper script. A global install of `playwright-cli` is optional.
## Skill path (set once)
```bash
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export PWCLI="$CODEX_HOME/skills/playwright/scripts/playwright_cli.sh"
```
User-scoped skills install under `$CODEX_HOME/skills` (default: `~/.codex/skills`).
## Quick start
Use the wrapper script:
```bash
"$PWCLI" open https://playwright.dev --headed
"$PWCLI" snapshot
"$PWCLI" click e15
"$PWCLI" type "Playwright"
"$PWCLI" press Enter
"$PWCLI" screenshot
```
If the user prefers a global install, this is also valid:
```bash
npm install -g @playwright/cli@latest
playwright-cli --help
```
## Core workflow
1. Open the page.
2. Snapshot to get stable element refs.
3. Interact using refs from the latest snapshot.
4. Re-snapshot after navigation or significant DOM changes.
5. Capture artifacts (screenshot, pdf, traces) when useful.
Minimal loop:
```bash
"$PWCLI" open https://example.com
"$PWCLI" snapshot
"$PWCLI" click e3
"$PWCLI" snapshot
```
## When to snapshot again
Snapshot again after:
- navigation
- clicking elements that change the UI substantially
- opening/closing modals or menus
- tab switches
Refs can go stale. When a command fails due to a missing ref, snapshot again.
## Recommended patterns
### Form fill and submit
```bash
"$PWCLI" open https://example.com/form
"$PWCLI" snapshot
"$PWCLI" fill e1 "user@example.com"
"$PWCLI" fill e2 "password123"
"$PWCLI" click e3
"$PWCLI" snapshot
```
### Debug a UI flow with traces
```bash
"$PWCLI" open https://example.com --headed
"$PWCLI" tracing-start
# ...interactions...
"$PWCLI" tracing-stop
```
### Multi-tab work
```bash
"$PWCLI" tab-new https://example.com
"$PWCLI" tab-list
"$PWCLI" tab-select 0
"$PWCLI" snapshot
```
## Wrapper script
The wrapper script uses `npx --package @playwright/cli playwright-cli` so the CLI can run without a global install:
```bash
"$PWCLI" --help
```
Prefer the wrapper unless the repository already standardizes on a global install.
## References
Open only what you need:
- CLI command reference: `references/cli.md`
- Practical workflows and troubleshooting: `references/workflows.md`
## Guardrails
- Always snapshot before referencing element ids like `e12`.
- Re-snapshot when refs seem stale.
- Prefer explicit commands over `eval` and `run-code` unless needed.
- When you do not have a fresh snapshot, use placeholder refs like `eX` and say why; do not bypass refs with `run-code`.
- Use `--headed` when a visual check will help.
- When capturing artifacts in this repo, use `output/playwright/` and avoid introducing new top-level artifact folders.
- Default to CLI commands and workflows, not Playwright test specs.

View File

@@ -0,0 +1,6 @@
interface:
display_name: "Playwright CLI Skill"
short_description: "Automate real browsers from the terminal"
icon_small: "./assets/playwright-small.svg"
icon_large: "./assets/playwright.png"
default_prompt: "Automate this browser workflow with Playwright and produce a reliable script with run steps."

View File

@@ -0,0 +1,3 @@
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" viewBox="0 0 16 16">
<path d="m8.55 7.568.124.028 5.16 1.548.137.054c.606.294.645 1.165.068 1.512l-.133.066-2.236.894-.894 2.236c-.285.713-1.263.714-1.578.065l-.054-.138-1.548-5.16a.866.866 0 0 1 .954-1.105ZM10 12.983l.715-1.787.037-.08a.865.865 0 0 1 .445-.402L12.983 10 8.721 8.72l1.278 4.262ZM4.723 10.38a.532.532 0 0 1 .752.752l-1.414 1.414a.532.532 0 1 1-.752-.752l1.414-1.414ZM2.27 5.86l1.932.517.1.039a.533.533 0 0 1-.269 1.007l-.106-.018-1.932-.517-.101-.039a.532.532 0 0 1 .27-1.006l.106.017Zm9.608-2.62a.532.532 0 0 1 .668.82l-1.414 1.414a.532.532 0 1 1-.752-.752l1.414-1.414.084-.068ZM6.237 1.618a.532.532 0 0 1 .652.377l.518 1.932.017.106a.533.533 0 0 1-1.007.27l-.039-.101-.517-1.932-.017-.106a.532.532 0 0 1 .393-.546Z"/>
</svg>

After

Width:  |  Height:  |  Size: 828 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.7 KiB

View File

@@ -0,0 +1,116 @@
# Playwright CLI Reference
Use the wrapper script unless the CLI is already installed globally:
```bash
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export PWCLI="$CODEX_HOME/skills/playwright/scripts/playwright_cli.sh"
"$PWCLI" --help
```
User-scoped skills install under `$CODEX_HOME/skills` (default: `~/.codex/skills`).
Optional convenience alias:
```bash
alias pwcli="$PWCLI"
```
## Core
```bash
pwcli open https://example.com
pwcli close
pwcli snapshot
pwcli click e3
pwcli dblclick e7
pwcli type "search terms"
pwcli press Enter
pwcli fill e5 "user@example.com"
pwcli drag e2 e8
pwcli hover e4
pwcli select e9 "option-value"
pwcli upload ./document.pdf
pwcli check e12
pwcli uncheck e12
pwcli eval "document.title"
pwcli eval "el => el.textContent" e5
pwcli dialog-accept
pwcli dialog-accept "confirmation text"
pwcli dialog-dismiss
pwcli resize 1920 1080
```
## Navigation
```bash
pwcli go-back
pwcli go-forward
pwcli reload
```
## Keyboard
```bash
pwcli press Enter
pwcli press ArrowDown
pwcli keydown Shift
pwcli keyup Shift
```
## Mouse
```bash
pwcli mousemove 150 300
pwcli mousedown
pwcli mousedown right
pwcli mouseup
pwcli mouseup right
pwcli mousewheel 0 100
```
## Save as
```bash
pwcli screenshot
pwcli screenshot e5
pwcli pdf
```
## Tabs
```bash
pwcli tab-list
pwcli tab-new
pwcli tab-new https://example.com/page
pwcli tab-close
pwcli tab-close 2
pwcli tab-select 0
```
## DevTools
```bash
pwcli console
pwcli console warning
pwcli network
pwcli run-code "await page.waitForTimeout(1000)"
pwcli tracing-start
pwcli tracing-stop
```
## Sessions
Use a named session to isolate work:
```bash
pwcli --session todo open https://demo.playwright.dev/todomvc
pwcli --session todo snapshot
```
Or set an environment variable once:
```bash
export PLAYWRIGHT_CLI_SESSION=todo
pwcli open https://demo.playwright.dev/todomvc
```

View File

@@ -0,0 +1,95 @@
# Playwright CLI Workflows
Use the wrapper script and snapshot often.
Assume `PWCLI` is set and `pwcli` is an alias for `"$PWCLI"`.
In this repo, run commands from `output/playwright/<label>/` to keep artifacts contained.
## Standard interaction loop
```bash
pwcli open https://example.com
pwcli snapshot
pwcli click e3
pwcli snapshot
```
## Form submission
```bash
pwcli open https://example.com/form --headed
pwcli snapshot
pwcli fill e1 "user@example.com"
pwcli fill e2 "password123"
pwcli click e3
pwcli snapshot
pwcli screenshot
```
## Data extraction
```bash
pwcli open https://example.com
pwcli snapshot
pwcli eval "document.title"
pwcli eval "el => el.textContent" e12
```
## Debugging and inspection
Capture console messages and network activity after reproducing an issue:
```bash
pwcli console warning
pwcli network
```
Record a trace around a suspicious flow:
```bash
pwcli tracing-start
# reproduce the issue
pwcli tracing-stop
pwcli screenshot
```
## Sessions
Use sessions to isolate work across projects:
```bash
pwcli --session marketing open https://example.com
pwcli --session marketing snapshot
pwcli --session checkout open https://example.com/checkout
```
Or set the session once:
```bash
export PLAYWRIGHT_CLI_SESSION=checkout
pwcli open https://example.com/checkout
```
## Configuration file
By default, the CLI reads `playwright-cli.json` from the current directory. Use `--config` to point at a specific file.
Minimal example:
```json
{
"browser": {
"launchOptions": {
"headless": false
},
"contextOptions": {
"viewport": { "width": 1280, "height": 720 }
}
}
}
```
## Troubleshooting
- If an element ref fails, run `pwcli snapshot` again and retry.
- If the page looks wrong, re-open with `--headed` and resize the window.
- If a flow depends on prior state, use a named `--session`.

View File

@@ -0,0 +1,25 @@
#!/usr/bin/env bash
set -euo pipefail
if ! command -v npx >/dev/null 2>&1; then
echo "Error: npx is required but not found on PATH." >&2
exit 1
fi
has_session_flag="false"
for arg in "$@"; do
case "$arg" in
--session|--session=*)
has_session_flag="true"
break
;;
esac
done
cmd=(npx --yes --package @playwright/cli playwright-cli)
if [[ "${has_session_flag}" != "true" && -n "${PLAYWRIGHT_CLI_SESSION:-}" ]]; then
cmd+=(--session "${PLAYWRIGHT_CLI_SESSION}")
fi
cmd+=("$@")
exec "${cmd[@]}"

View File

@@ -0,0 +1,213 @@
---
name: receiving-code-review
description: Use when receiving code review feedback, before implementing suggestions, especially if feedback seems unclear or technically questionable - requires technical rigor and verification, not performative agreement or blind implementation
---
# Code Review Reception
## Overview
Code review requires technical evaluation, not emotional performance.
**Core principle:** Verify before implementing. Ask before assuming. Technical correctness over social comfort.
## The Response Pattern
```
WHEN receiving code review feedback:
1. READ: Complete feedback without reacting
2. UNDERSTAND: Restate requirement in own words (or ask)
3. VERIFY: Check against codebase reality
4. EVALUATE: Technically sound for THIS codebase?
5. RESPOND: Technical acknowledgment or reasoned pushback
6. IMPLEMENT: One item at a time, test each
```
## Forbidden Responses
**NEVER:**
- "You're absolutely right!" (explicit CLAUDE.md violation)
- "Great point!" / "Excellent feedback!" (performative)
- "Let me implement that now" (before verification)
**INSTEAD:**
- Restate the technical requirement
- Ask clarifying questions
- Push back with technical reasoning if wrong
- Just start working (actions > words)
## Handling Unclear Feedback
```
IF any item is unclear:
STOP - do not implement anything yet
ASK for clarification on unclear items
WHY: Items may be related. Partial understanding = wrong implementation.
```
**Example:**
```
your human partner: "Fix 1-6"
You understand 1,2,3,6. Unclear on 4,5.
❌ WRONG: Implement 1,2,3,6 now, ask about 4,5 later
✅ RIGHT: "I understand items 1,2,3,6. Need clarification on 4 and 5 before proceeding."
```
## Source-Specific Handling
### From your human partner
- **Trusted** - implement after understanding
- **Still ask** if scope unclear
- **No performative agreement**
- **Skip to action** or technical acknowledgment
### From External Reviewers
```
BEFORE implementing:
1. Check: Technically correct for THIS codebase?
2. Check: Breaks existing functionality?
3. Check: Reason for current implementation?
4. Check: Works on all platforms/versions?
5. Check: Does reviewer understand full context?
IF suggestion seems wrong:
Push back with technical reasoning
IF can't easily verify:
Say so: "I can't verify this without [X]. Should I [investigate/ask/proceed]?"
IF conflicts with your human partner's prior decisions:
Stop and discuss with your human partner first
```
**your human partner's rule:** "External feedback - be skeptical, but check carefully"
## YAGNI Check for "Professional" Features
```
IF reviewer suggests "implementing properly":
grep codebase for actual usage
IF unused: "This endpoint isn't called. Remove it (YAGNI)?"
IF used: Then implement properly
```
**your human partner's rule:** "You and reviewer both report to me. If we don't need this feature, don't add it."
## Implementation Order
```
FOR multi-item feedback:
1. Clarify anything unclear FIRST
2. Then implement in this order:
- Blocking issues (breaks, security)
- Simple fixes (typos, imports)
- Complex fixes (refactoring, logic)
3. Test each fix individually
4. Verify no regressions
```
## When To Push Back
Push back when:
- Suggestion breaks existing functionality
- Reviewer lacks full context
- Violates YAGNI (unused feature)
- Technically incorrect for this stack
- Legacy/compatibility reasons exist
- Conflicts with your human partner's architectural decisions
**How to push back:**
- Use technical reasoning, not defensiveness
- Ask specific questions
- Reference working tests/code
- Involve your human partner if architectural
**Signal if uncomfortable pushing back out loud:** "Strange things are afoot at the Circle K"
## Acknowledging Correct Feedback
When feedback IS correct:
```
✅ "Fixed. [Brief description of what changed]"
✅ "Good catch - [specific issue]. Fixed in [location]."
✅ [Just fix it and show in the code]
❌ "You're absolutely right!"
❌ "Great point!"
❌ "Thanks for catching that!"
❌ "Thanks for [anything]"
❌ ANY gratitude expression
```
**Why no thanks:** Actions speak. Just fix it. The code itself shows you heard the feedback.
**If you catch yourself about to write "Thanks":** DELETE IT. State the fix instead.
## Gracefully Correcting Your Pushback
If you pushed back and were wrong:
```
✅ "You were right - I checked [X] and it does [Y]. Implementing now."
✅ "Verified this and you're correct. My initial understanding was wrong because [reason]. Fixing."
❌ Long apology
❌ Defending why you pushed back
❌ Over-explaining
```
State the correction factually and move on.
## Common Mistakes
| Mistake | Fix |
|---------|-----|
| Performative agreement | State requirement or just act |
| Blind implementation | Verify against codebase first |
| Batch without testing | One at a time, test each |
| Assuming reviewer is right | Check if breaks things |
| Avoiding pushback | Technical correctness > comfort |
| Partial implementation | Clarify all items first |
| Can't verify, proceed anyway | State limitation, ask for direction |
## Real Examples
**Performative Agreement (Bad):**
```
Reviewer: "Remove legacy code"
❌ "You're absolutely right! Let me remove that..."
```
**Technical Verification (Good):**
```
Reviewer: "Remove legacy code"
✅ "Checking... build target is 10.15+, this API needs 13+. Need legacy for backward compat. Current impl has wrong bundle ID - fix it or drop pre-13 support?"
```
**YAGNI (Good):**
```
Reviewer: "Implement proper metrics tracking with database, date filters, CSV export"
✅ "Grepped codebase - nothing calls this endpoint. Remove it (YAGNI)? Or is there usage I'm missing?"
```
**Unclear Item (Good):**
```
your human partner: "Fix items 1-6"
You understand 1,2,3,6. Unclear on 4,5.
✅ "Understand 1,2,3,6. Need clarification on 4 and 5 before implementing."
```
## GitHub Thread Replies
When replying to inline review comments on GitHub, reply in the comment thread (`gh api repos/{owner}/{repo}/pulls/{pr}/comments/{id}/replies`), not as a top-level PR comment.
## The Bottom Line
**External feedback = suggestions to evaluate, not orders to follow.**
Verify. Question. Then implement.
No performative agreement. Technical rigor always.

View File

@@ -0,0 +1,645 @@
---
name: refactor
description: 'Surgical code refactoring to improve maintainability without changing behavior. Covers extracting functions, renaming variables, breaking down god functions, improving type safety, eliminating code smells, and applying design patterns. Less drastic than repo-rebuilder; use for gradual improvements.'
license: MIT
---
# Refactor
## Overview
Improve code structure and readability without changing external behavior. Refactoring is gradual evolution, not revolution. Use this for improving existing code, not rewriting from scratch.
## When to Use
Use this skill when:
- Code is hard to understand or maintain
- Functions/classes are too large
- Code smells need addressing
- Adding features is difficult due to code structure
- User asks "clean up this code", "refactor this", "improve this"
---
## Refactoring Principles
### The Golden Rules
1. **Behavior is preserved** - Refactoring doesn't change what the code does, only how
2. **Small steps** - Make tiny changes, test after each
3. **Version control is your friend** - Commit before and after each safe state
4. **Tests are essential** - Without tests, you're not refactoring, you're editing
5. **One thing at a time** - Don't mix refactoring with feature changes
### When NOT to Refactor
```
- Code that works and won't change again (if it ain't broke...)
- Critical production code without tests (add tests first)
- When you're under a tight deadline
- "Just because" - need a clear purpose
```
---
## Common Code Smells & Fixes
### 1. Long Method/Function
```diff
# BAD: 200-line function that does everything
- async function processOrder(orderId) {
- // 50 lines: fetch order
- // 30 lines: validate order
- // 40 lines: calculate pricing
- // 30 lines: update inventory
- // 20 lines: create shipment
- // 30 lines: send notifications
- }
# GOOD: Broken into focused functions
+ async function processOrder(orderId) {
+ const order = await fetchOrder(orderId);
+ validateOrder(order);
+ const pricing = calculatePricing(order);
+ await updateInventory(order);
+ const shipment = await createShipment(order);
+ await sendNotifications(order, pricing, shipment);
+ return { order, pricing, shipment };
+ }
```
### 2. Duplicated Code
```diff
# BAD: Same logic in multiple places
- function calculateUserDiscount(user) {
- if (user.membership === 'gold') return user.total * 0.2;
- if (user.membership === 'silver') return user.total * 0.1;
- return 0;
- }
-
- function calculateOrderDiscount(order) {
- if (order.user.membership === 'gold') return order.total * 0.2;
- if (order.user.membership === 'silver') return order.total * 0.1;
- return 0;
- }
# GOOD: Extract common logic
+ function getMembershipDiscountRate(membership) {
+ const rates = { gold: 0.2, silver: 0.1 };
+ return rates[membership] || 0;
+ }
+
+ function calculateUserDiscount(user) {
+ return user.total * getMembershipDiscountRate(user.membership);
+ }
+
+ function calculateOrderDiscount(order) {
+ return order.total * getMembershipDiscountRate(order.user.membership);
+ }
```
### 3. Large Class/Module
```diff
# BAD: God object that knows too much
- class UserManager {
- createUser() { /* ... */ }
- updateUser() { /* ... */ }
- deleteUser() { /* ... */ }
- sendEmail() { /* ... */ }
- generateReport() { /* ... */ }
- handlePayment() { /* ... */ }
- validateAddress() { /* ... */ }
- // 50 more methods...
- }
# GOOD: Single responsibility per class
+ class UserService {
+ create(data) { /* ... */ }
+ update(id, data) { /* ... */ }
+ delete(id) { /* ... */ }
+ }
+
+ class EmailService {
+ send(to, subject, body) { /* ... */ }
+ }
+
+ class ReportService {
+ generate(type, params) { /* ... */ }
+ }
+
+ class PaymentService {
+ process(amount, method) { /* ... */ }
+ }
```
### 4. Long Parameter List
```diff
# BAD: Too many parameters
- function createUser(email, password, name, age, address, city, country, phone) {
- /* ... */
- }
# GOOD: Group related parameters
+ interface UserData {
+ email: string;
+ password: string;
+ name: string;
+ age?: number;
+ address?: Address;
+ phone?: string;
+ }
+
+ function createUser(data: UserData) {
+ /* ... */
+ }
# EVEN BETTER: Use builder pattern for complex construction
+ const user = UserBuilder
+ .email('test@example.com')
+ .password('secure123')
+ .name('Test User')
+ .address(address)
+ .build();
```
### 5. Feature Envy
```diff
# BAD: Method that uses another object's data more than its own
- class Order {
- calculateDiscount(user) {
- if (user.membershipLevel === 'gold') {
+ return this.total * 0.2;
+ }
+ if (user.accountAge > 365) {
+ return this.total * 0.1;
+ }
+ return 0;
+ }
+ }
# GOOD: Move logic to the object that owns the data
+ class User {
+ getDiscountRate(orderTotal) {
+ if (this.membershipLevel === 'gold') return 0.2;
+ if (this.accountAge > 365) return 0.1;
+ return 0;
+ }
+ }
+
+ class Order {
+ calculateDiscount(user) {
+ return this.total * user.getDiscountRate(this.total);
+ }
+ }
```
### 6. Primitive Obsession
```diff
# BAD: Using primitives for domain concepts
- function sendEmail(to, subject, body) { /* ... */ }
- sendEmail('user@example.com', 'Hello', '...');
- function createPhone(country, number) {
- return `${country}-${number}`;
- }
# GOOD: Use domain types
+ class Email {
+ private constructor(public readonly value: string) {
+ if (!Email.isValid(value)) throw new Error('Invalid email');
+ }
+ static create(value: string) { return new Email(value); }
+ static isValid(email: string) { return /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email); }
+ }
+
+ class PhoneNumber {
+ constructor(
+ public readonly country: string,
+ public readonly number: string
+ ) {
+ if (!PhoneNumber.isValid(country, number)) throw new Error('Invalid phone');
+ }
+ toString() { return `${this.country}-${this.number}`; }
+ static isValid(country: string, number: string) { /* ... */ }
+ }
+
+ // Usage
+ const email = Email.create('user@example.com');
+ const phone = new PhoneNumber('1', '555-1234');
```
### 7. Magic Numbers/Strings
```diff
# BAD: Unexplained values
- if (user.status === 2) { /* ... */ }
- const discount = total * 0.15;
- setTimeout(callback, 86400000);
# GOOD: Named constants
+ const UserStatus = {
+ ACTIVE: 1,
+ INACTIVE: 2,
+ SUSPENDED: 3
+ } as const;
+
+ const DISCOUNT_RATES = {
+ STANDARD: 0.1,
+ PREMIUM: 0.15,
+ VIP: 0.2
+ } as const;
+
+ const ONE_DAY_MS = 24 * 60 * 60 * 1000;
+
+ if (user.status === UserStatus.INACTIVE) { /* ... */ }
+ const discount = total * DISCOUNT_RATES.PREMIUM;
+ setTimeout(callback, ONE_DAY_MS);
```
### 8. Nested Conditionals
```diff
# BAD: Arrow code
- function process(order) {
- if (order) {
- if (order.user) {
- if (order.user.isActive) {
- if (order.total > 0) {
- return processOrder(order);
+ } else {
+ return { error: 'Invalid total' };
+ }
+ } else {
+ return { error: 'User inactive' };
+ }
+ } else {
+ return { error: 'No user' };
+ }
+ } else {
+ return { error: 'No order' };
+ }
+ }
# GOOD: Guard clauses / early returns
+ function process(order) {
+ if (!order) return { error: 'No order' };
+ if (!order.user) return { error: 'No user' };
+ if (!order.user.isActive) return { error: 'User inactive' };
+ if (order.total <= 0) return { error: 'Invalid total' };
+ return processOrder(order);
+ }
# EVEN BETTER: Using Result type
+ function process(order): Result<ProcessedOrder, Error> {
+ return Result.combine([
+ validateOrderExists(order),
+ validateUserExists(order),
+ validateUserActive(order.user),
+ validateOrderTotal(order)
+ ]).flatMap(() => processOrder(order));
+ }
```
### 9. Dead Code
```diff
# BAD: Unused code lingers
- function oldImplementation() { /* ... */ }
- const DEPRECATED_VALUE = 5;
- import { unusedThing } from './somewhere';
- // Commented out code
- // function oldCode() { /* ... */ }
# GOOD: Remove it
+ // Delete unused functions, imports, and commented code
+ // If you need it again, git history has it
```
### 10. Inappropriate Intimacy
```diff
# BAD: One class reaches deep into another
- class OrderProcessor {
- process(order) {
- order.user.profile.address.street; // Too intimate
- order.repository.connection.config; // Breaking encapsulation
+ }
+ }
# GOOD: Ask, don't tell
+ class OrderProcessor {
+ process(order) {
+ order.getShippingAddress(); // Order knows how to get it
+ order.save(); // Order knows how to save itself
+ }
+ }
```
---
## Extract Method Refactoring
### Before and After
```diff
# Before: One long function
- function printReport(users) {
- console.log('USER REPORT');
- console.log('============');
- console.log('');
- console.log(`Total users: ${users.length}`);
- console.log('');
- console.log('ACTIVE USERS');
- console.log('------------');
- const active = users.filter(u => u.isActive);
- active.forEach(u => {
- console.log(`- ${u.name} (${u.email})`);
- });
- console.log('');
- console.log(`Active: ${active.length}`);
- console.log('');
- console.log('INACTIVE USERS');
- console.log('--------------');
- const inactive = users.filter(u => !u.isActive);
- inactive.forEach(u => {
- console.log(`- ${u.name} (${u.email})`);
- });
- console.log('');
- console.log(`Inactive: ${inactive.length}`);
- }
# After: Extracted methods
+ function printReport(users) {
+ printHeader('USER REPORT');
+ console.log(`Total users: ${users.length}\n`);
+ printUserSection('ACTIVE USERS', users.filter(u => u.isActive));
+ printUserSection('INACTIVE USERS', users.filter(u => !u.isActive));
+ }
+
+ function printHeader(title) {
+ const line = '='.repeat(title.length);
+ console.log(title);
+ console.log(line);
+ console.log('');
+ }
+
+ function printUserSection(title, users) {
+ console.log(title);
+ console.log('-'.repeat(title.length));
+ users.forEach(u => console.log(`- ${u.name} (${u.email})`));
+ console.log('');
+ console.log(`${title.split(' ')[0]}: ${users.length}`);
+ console.log('');
+ }
```
---
## Introducing Type Safety
### From Untyped to Typed
```diff
# Before: No types
- function calculateDiscount(user, total, membership, date) {
- if (membership === 'gold' && date.getDay() === 5) {
- return total * 0.25;
- }
- if (membership === 'gold') return total * 0.2;
- return total * 0.1;
- }
# After: Full type safety
+ type Membership = 'bronze' | 'silver' | 'gold';
+
+ interface User {
+ id: string;
+ name: string;
+ membership: Membership;
+ }
+
+ interface DiscountResult {
+ original: number;
+ discount: number;
+ final: number;
+ rate: number;
+ }
+
+ function calculateDiscount(
+ user: User,
+ total: number,
+ date: Date = new Date()
+ ): DiscountResult {
+ if (total < 0) throw new Error('Total cannot be negative');
+
+ let rate = 0.1; // Default bronze
+
+ if (user.membership === 'gold' && date.getDay() === 5) {
+ rate = 0.25; // Friday bonus for gold
+ } else if (user.membership === 'gold') {
+ rate = 0.2;
+ } else if (user.membership === 'silver') {
+ rate = 0.15;
+ }
+
+ const discount = total * rate;
+
+ return {
+ original: total,
+ discount,
+ final: total - discount,
+ rate
+ };
+ }
```
---
## Design Patterns for Refactoring
### Strategy Pattern
```diff
# Before: Conditional logic
- function calculateShipping(order, method) {
- if (method === 'standard') {
- return order.total > 50 ? 0 : 5.99;
- } else if (method === 'express') {
- return order.total > 100 ? 9.99 : 14.99;
+ } else if (method === 'overnight') {
+ return 29.99;
+ }
+ }
# After: Strategy pattern
+ interface ShippingStrategy {
+ calculate(order: Order): number;
+ }
+
+ class StandardShipping implements ShippingStrategy {
+ calculate(order: Order) {
+ return order.total > 50 ? 0 : 5.99;
+ }
+ }
+
+ class ExpressShipping implements ShippingStrategy {
+ calculate(order: Order) {
+ return order.total > 100 ? 9.99 : 14.99;
+ }
+ }
+
+ class OvernightShipping implements ShippingStrategy {
+ calculate(order: Order) {
+ return 29.99;
+ }
+ }
+
+ function calculateShipping(order: Order, strategy: ShippingStrategy) {
+ return strategy.calculate(order);
+ }
```
### Chain of Responsibility
```diff
# Before: Nested validation
- function validate(user) {
- const errors = [];
- if (!user.email) errors.push('Email required');
+ else if (!isValidEmail(user.email)) errors.push('Invalid email');
+ if (!user.name) errors.push('Name required');
+ if (user.age < 18) errors.push('Must be 18+');
+ if (user.country === 'blocked') errors.push('Country not supported');
+ return errors;
+ }
# After: Chain of responsibility
+ abstract class Validator {
+ abstract validate(user: User): string | null;
+ setNext(validator: Validator): Validator {
+ this.next = validator;
+ return validator;
+ }
+ validate(user: User): string | null {
+ const error = this.doValidate(user);
+ if (error) return error;
+ return this.next?.validate(user) ?? null;
+ }
+ }
+
+ class EmailRequiredValidator extends Validator {
+ doValidate(user: User) {
+ return !user.email ? 'Email required' : null;
+ }
+ }
+
+ class EmailFormatValidator extends Validator {
+ doValidate(user: User) {
+ return user.email && !isValidEmail(user.email) ? 'Invalid email' : null;
+ }
+ }
+
+ // Build the chain
+ const validator = new EmailRequiredValidator()
+ .setNext(new EmailFormatValidator())
+ .setNext(new NameRequiredValidator())
+ .setNext(new AgeValidator())
+ .setNext(new CountryValidator());
```
---
## Refactoring Steps
### Safe Refactoring Process
```
1. PREPARE
- Ensure tests exist (write them if missing)
- Commit current state
- Create feature branch
2. IDENTIFY
- Find the code smell to address
- Understand what the code does
- Plan the refactoring
3. REFACTOR (small steps)
- Make one small change
- Run tests
- Commit if tests pass
- Repeat
4. VERIFY
- All tests pass
- Manual testing if needed
- Performance unchanged or improved
5. CLEAN UP
- Update comments
- Update documentation
- Final commit
```
---
## Refactoring Checklist
### Code Quality
- [ ] Functions are small (< 50 lines)
- [ ] Functions do one thing
- [ ] No duplicated code
- [ ] Descriptive names (variables, functions, classes)
- [ ] No magic numbers/strings
- [ ] Dead code removed
### Structure
- [ ] Related code is together
- [ ] Clear module boundaries
- [ ] Dependencies flow in one direction
- [ ] No circular dependencies
### Type Safety
- [ ] Types defined for all public APIs
- [ ] No `any` types without justification
- [ ] Nullable types explicitly marked
### Testing
- [ ] Refactored code is tested
- [ ] Tests cover edge cases
- [ ] All tests pass
---
## Common Refactoring Operations
| Operation | Description |
| --------------------------------------------- | ------------------------------------- |
| Extract Method | Turn code fragment into method |
| Extract Class | Move behavior to new class |
| Extract Interface | Create interface from implementation |
| Inline Method | Move method body back to caller |
| Inline Class | Move class behavior to caller |
| Pull Up Method | Move method to superclass |
| Push Down Method | Move method to subclass |
| Rename Method/Variable | Improve clarity |
| Introduce Parameter Object | Group related parameters |
| Replace Conditional with Polymorphism | Use polymorphism instead of switch/if |
| Replace Magic Number with Constant | Named constants |
| Decompose Conditional | Break complex conditions |
| Consolidate Conditional | Combine duplicate conditions |
| Replace Nested Conditional with Guard Clauses | Early returns |
| Introduce Null Object | Eliminate null checks |
| Replace Type Code with Class/Enum | Strong typing |
| Replace Inheritance with Delegation | Composition over inheritance |

View File

@@ -0,0 +1,105 @@
---
name: requesting-code-review
description: Use when completing tasks, implementing major features, or before merging to verify work meets requirements
---
# Requesting Code Review
Dispatch superpowers:code-reviewer subagent to catch issues before they cascade.
**Core principle:** Review early, review often.
## When to Request Review
**Mandatory:**
- After each task in subagent-driven development
- After completing major feature
- Before merge to main
**Optional but valuable:**
- When stuck (fresh perspective)
- Before refactoring (baseline check)
- After fixing complex bug
## How to Request
**1. Get git SHAs:**
```bash
BASE_SHA=$(git rev-parse HEAD~1) # or origin/main
HEAD_SHA=$(git rev-parse HEAD)
```
**2. Dispatch code-reviewer subagent:**
Use Task tool with superpowers:code-reviewer type, fill template at `code-reviewer.md`
**Placeholders:**
- `{WHAT_WAS_IMPLEMENTED}` - What you just built
- `{PLAN_OR_REQUIREMENTS}` - What it should do
- `{BASE_SHA}` - Starting commit
- `{HEAD_SHA}` - Ending commit
- `{DESCRIPTION}` - Brief summary
**3. Act on feedback:**
- Fix Critical issues immediately
- Fix Important issues before proceeding
- Note Minor issues for later
- Push back if reviewer is wrong (with reasoning)
## Example
```
[Just completed Task 2: Add verification function]
You: Let me request code review before proceeding.
BASE_SHA=$(git log --oneline | grep "Task 1" | head -1 | awk '{print $1}')
HEAD_SHA=$(git rev-parse HEAD)
[Dispatch superpowers:code-reviewer subagent]
WHAT_WAS_IMPLEMENTED: Verification and repair functions for conversation index
PLAN_OR_REQUIREMENTS: Task 2 from docs/plans/deployment-plan.md
BASE_SHA: a7981ec
HEAD_SHA: 3df7661
DESCRIPTION: Added verifyIndex() and repairIndex() with 4 issue types
[Subagent returns]:
Strengths: Clean architecture, real tests
Issues:
Important: Missing progress indicators
Minor: Magic number (100) for reporting interval
Assessment: Ready to proceed
You: [Fix progress indicators]
[Continue to Task 3]
```
## Integration with Workflows
**Subagent-Driven Development:**
- Review after EACH task
- Catch issues before they compound
- Fix before moving to next task
**Executing Plans:**
- Review after each batch (3 tasks)
- Get feedback, apply, continue
**Ad-Hoc Development:**
- Review before merge
- Review when stuck
## Red Flags
**Never:**
- Skip review because "it's simple"
- Ignore Critical issues
- Proceed with unfixed Important issues
- Argue with valid technical feedback
**If reviewer wrong:**
- Push back with technical reasoning
- Show code/tests that prove it works
- Request clarification
See template at: requesting-code-review/code-reviewer.md

View File

@@ -0,0 +1,146 @@
# Code Review Agent
You are reviewing code changes for production readiness.
**Your task:**
1. Review {WHAT_WAS_IMPLEMENTED}
2. Compare against {PLAN_OR_REQUIREMENTS}
3. Check code quality, architecture, testing
4. Categorize issues by severity
5. Assess production readiness
## What Was Implemented
{DESCRIPTION}
## Requirements/Plan
{PLAN_REFERENCE}
## Git Range to Review
**Base:** {BASE_SHA}
**Head:** {HEAD_SHA}
```bash
git diff --stat {BASE_SHA}..{HEAD_SHA}
git diff {BASE_SHA}..{HEAD_SHA}
```
## Review Checklist
**Code Quality:**
- Clean separation of concerns?
- Proper error handling?
- Type safety (if applicable)?
- DRY principle followed?
- Edge cases handled?
**Architecture:**
- Sound design decisions?
- Scalability considerations?
- Performance implications?
- Security concerns?
**Testing:**
- Tests actually test logic (not mocks)?
- Edge cases covered?
- Integration tests where needed?
- All tests passing?
**Requirements:**
- All plan requirements met?
- Implementation matches spec?
- No scope creep?
- Breaking changes documented?
**Production Readiness:**
- Migration strategy (if schema changes)?
- Backward compatibility considered?
- Documentation complete?
- No obvious bugs?
## Output Format
### Strengths
[What's well done? Be specific.]
### Issues
#### Critical (Must Fix)
[Bugs, security issues, data loss risks, broken functionality]
#### Important (Should Fix)
[Architecture problems, missing features, poor error handling, test gaps]
#### Minor (Nice to Have)
[Code style, optimization opportunities, documentation improvements]
**For each issue:**
- File:line reference
- What's wrong
- Why it matters
- How to fix (if not obvious)
### Recommendations
[Improvements for code quality, architecture, or process]
### Assessment
**Ready to merge?** [Yes/No/With fixes]
**Reasoning:** [Technical assessment in 1-2 sentences]
## Critical Rules
**DO:**
- Categorize by actual severity (not everything is Critical)
- Be specific (file:line, not vague)
- Explain WHY issues matter
- Acknowledge strengths
- Give clear verdict
**DON'T:**
- Say "looks good" without checking
- Mark nitpicks as Critical
- Give feedback on code you didn't review
- Be vague ("improve error handling")
- Avoid giving a clear verdict
## Example Output
```
### Strengths
- Clean database schema with proper migrations (db.ts:15-42)
- Comprehensive test coverage (18 tests, all edge cases)
- Good error handling with fallbacks (summarizer.ts:85-92)
### Issues
#### Important
1. **Missing help text in CLI wrapper**
- File: index-conversations:1-31
- Issue: No --help flag, users won't discover --concurrency
- Fix: Add --help case with usage examples
2. **Date validation missing**
- File: search.ts:25-27
- Issue: Invalid dates silently return no results
- Fix: Validate ISO format, throw error with example
#### Minor
1. **Progress indicators**
- File: indexer.ts:130
- Issue: No "X of Y" counter for long operations
- Impact: Users don't know how long to wait
### Recommendations
- Add progress reporting for user experience
- Consider config file for excluded projects (portability)
### Assessment
**Ready to merge: With fixes**
**Reasoning:** Core implementation is solid with good architecture and tests. Important issues (help text, date validation) are easily fixed and don't affect core functionality.
```

View File

@@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf of
any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don\'t include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -0,0 +1,267 @@
---
name: "screenshot"
description: "Use when the user explicitly asks for a desktop or system screenshot (full screen, specific app or window, or a pixel region), or when tool-specific capture capabilities are unavailable and an OS-level capture is needed."
---
# Screenshot Capture
Follow these save-location rules every time:
1) If the user specifies a path, save there.
2) If the user asks for a screenshot without a path, save to the OS default screenshot location.
3) If Codex needs a screenshot for its own inspection, save to the temp directory.
## Tool priority
- Prefer tool-specific screenshot capabilities when available (for example: a Figma MCP/skill for Figma files, or Playwright/agent-browser tools for browsers and Electron apps).
- Use this skill when explicitly asked, for whole-system desktop captures, or when a tool-specific capture cannot get what you need.
- Otherwise, treat this skill as the default for desktop apps without a better-integrated capture tool.
## macOS permission preflight (reduce repeated prompts)
On macOS, run the preflight helper once before window/app capture. It checks
Screen Recording permission, explains why it is needed, and requests it in one
place.
The helpers route Swift's module cache to `$TMPDIR/codex-swift-module-cache`
to avoid extra sandbox module-cache prompts.
```bash
bash <path-to-skill>/scripts/ensure_macos_permissions.sh
```
To avoid multiple sandbox approval prompts, combine preflight + capture in one
command when possible:
```bash
bash <path-to-skill>/scripts/ensure_macos_permissions.sh && \
python3 <path-to-skill>/scripts/take_screenshot.py --app "Codex"
```
For Codex inspection runs, keep the output in temp:
```bash
bash <path-to-skill>/scripts/ensure_macos_permissions.sh && \
python3 <path-to-skill>/scripts/take_screenshot.py --app "<App>" --mode temp
```
Use the bundled scripts to avoid re-deriving OS-specific commands.
## macOS and Linux (Python helper)
Run the helper from the repo root:
```bash
python3 <path-to-skill>/scripts/take_screenshot.py
```
Common patterns:
- Default location (user asked for "a screenshot"):
```bash
python3 <path-to-skill>/scripts/take_screenshot.py
```
- Temp location (Codex visual check):
```bash
python3 <path-to-skill>/scripts/take_screenshot.py --mode temp
```
- Explicit location (user provided a path or filename):
```bash
python3 <path-to-skill>/scripts/take_screenshot.py --path output/screen.png
```
- App/window capture by app name (macOS only; substring match is OK; captures all matching windows):
```bash
python3 <path-to-skill>/scripts/take_screenshot.py --app "Codex"
```
- Specific window title within an app (macOS only):
```bash
python3 <path-to-skill>/scripts/take_screenshot.py --app "Codex" --window-name "Settings"
```
- List matching window ids before capturing (macOS only):
```bash
python3 <path-to-skill>/scripts/take_screenshot.py --list-windows --app "Codex"
```
- Pixel region (x,y,w,h):
```bash
python3 <path-to-skill>/scripts/take_screenshot.py --mode temp --region 100,200,800,600
```
- Focused/active window (captures only the frontmost window; use `--app` to capture all windows):
```bash
python3 <path-to-skill>/scripts/take_screenshot.py --mode temp --active-window
```
- Specific window id (use --list-windows on macOS to discover ids):
```bash
python3 <path-to-skill>/scripts/take_screenshot.py --window-id 12345
```
The script prints one path per capture. When multiple windows or displays match, it prints multiple paths (one per line) and adds suffixes like `-w<windowId>` or `-d<display>`. View each path sequentially with the image viewer tool, and only manipulate images if needed or requested.
### Workflow examples
- "Take a look at <App> and tell me what you see": capture to temp, then view each printed path in order.
```bash
bash <path-to-skill>/scripts/ensure_macos_permissions.sh && \
python3 <path-to-skill>/scripts/take_screenshot.py --app "<App>" --mode temp
```
- "The design from Figma is not matching what is implemented": use a Figma MCP/skill to capture the design first, then capture the running app with this skill (typically to temp) and compare the raw screenshots before any manipulation.
### Multi-display behavior
- On macOS, full-screen captures save one file per display when multiple monitors are connected.
- On Linux and Windows, full-screen captures use the virtual desktop (all monitors in one image); use `--region` to isolate a single display when needed.
### Linux prerequisites and selection logic
The helper automatically selects the first available tool:
1) `scrot`
2) `gnome-screenshot`
3) ImageMagick `import`
If none are available, ask the user to install one of them and retry.
Coordinate regions require `scrot` or ImageMagick `import`.
`--app`, `--window-name`, and `--list-windows` are macOS-only. On Linux, use
`--active-window` or provide `--window-id` when available.
## Windows (PowerShell helper)
Run the PowerShell helper:
```powershell
powershell -ExecutionPolicy Bypass -File <path-to-skill>/scripts/take_screenshot.ps1
```
Common patterns:
- Default location:
```powershell
powershell -ExecutionPolicy Bypass -File <path-to-skill>/scripts/take_screenshot.ps1
```
- Temp location (Codex visual check):
```powershell
powershell -ExecutionPolicy Bypass -File <path-to-skill>/scripts/take_screenshot.ps1 -Mode temp
```
- Explicit path:
```powershell
powershell -ExecutionPolicy Bypass -File <path-to-skill>/scripts/take_screenshot.ps1 -Path "C:\Temp\screen.png"
```
- Pixel region (x,y,w,h):
```powershell
powershell -ExecutionPolicy Bypass -File <path-to-skill>/scripts/take_screenshot.ps1 -Mode temp -Region 100,200,800,600
```
- Active window (ask the user to focus it first):
```powershell
powershell -ExecutionPolicy Bypass -File <path-to-skill>/scripts/take_screenshot.ps1 -Mode temp -ActiveWindow
```
- Specific window handle (only when provided):
```powershell
powershell -ExecutionPolicy Bypass -File <path-to-skill>/scripts/take_screenshot.ps1 -WindowHandle 123456
```
## Direct OS commands (fallbacks)
Use these when you cannot run the helpers.
### macOS
- Full screen to a specific path:
```bash
screencapture -x output/screen.png
```
- Pixel region:
```bash
screencapture -x -R100,200,800,600 output/region.png
```
- Specific window id:
```bash
screencapture -x -l12345 output/window.png
```
- Interactive selection or window pick:
```bash
screencapture -x -i output/interactive.png
```
### Linux
- Full screen:
```bash
scrot output/screen.png
```
```bash
gnome-screenshot -f output/screen.png
```
```bash
import -window root output/screen.png
```
- Pixel region:
```bash
scrot -a 100,200,800,600 output/region.png
```
```bash
import -window root -crop 800x600+100+200 output/region.png
```
- Active window:
```bash
scrot -u output/window.png
```
```bash
gnome-screenshot -w -f output/window.png
```
## Error handling
- On macOS, run `bash <path-to-skill>/scripts/ensure_macos_permissions.sh` first to request Screen Recording in one place.
- If you see "screen capture checks are blocked in the sandbox", "could not create image from display", or Swift `ModuleCache` permission errors in a sandboxed run, rerun the command with escalated permissions.
- If macOS app/window capture returns no matches, run `--list-windows --app "AppName"` and retry with `--window-id`, and make sure the app is visible on screen.
- If Linux region/window capture fails, check tool availability with `command -v scrot`, `command -v gnome-screenshot`, and `command -v import`.
- If saving to the OS default location fails with permission errors in a sandbox, rerun the command with escalated permissions.
- Always report the saved file path in the response.

View File

@@ -0,0 +1,6 @@
interface:
display_name: "Screenshot Capture"
short_description: "Capture screenshots"
icon_small: "./assets/screenshot-small.svg"
icon_large: "./assets/screenshot.png"
default_prompt: "Capture the right screenshot for this task (target, area, and output path)."

View File

@@ -0,0 +1,5 @@
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" viewBox="0 0 16 16">
<path fill="currentColor" d="M2.666 10.134c.294 0 .532.239.532.532v.667c0 .81.658 1.468 1.468 1.468h.667l.108.01a.533.533 0 0 1 0 1.043l-.108.01h-.667a2.532 2.532 0 0 1-2.532-2.531v-.667c0-.293.239-.532.532-.532Zm10.667 0c.293 0 .532.239.532.532v.667a2.532 2.532 0 0 1-2.532 2.532h-.667a.532.532 0 0 1 0-1.064h.667c.81 0 1.468-.657 1.468-1.468v-.667c0-.293.238-.531.532-.532Z"/>
<path fill="currentColor" fill-rule="evenodd" d="M8 5.468a2.532 2.532 0 1 1 0 5.064 2.532 2.532 0 0 1 0-5.064Zm0 1.064a1.468 1.468 0 1 0 0 2.936 1.468 1.468 0 0 0 0-2.936Z" clip-rule="evenodd"/>
<path fill="currentColor" d="M5.44 2.145a.532.532 0 0 1 0 1.043l-.107.01h-.667a1.47 1.47 0 0 0-1.468 1.468v.667a.532.532 0 0 1-1.064 0v-.667a2.532 2.532 0 0 1 2.532-2.532h.667l.108.011Zm5.893-.011a2.532 2.532 0 0 1 2.532 2.532v.667a.532.532 0 0 1-1.064 0v-.667c0-.81-.658-1.468-1.468-1.468h-.667a.532.532 0 0 1 0-1.064h.667Z"/>
</svg>

After

Width:  |  Height:  |  Size: 1019 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 860 B

View File

@@ -0,0 +1,54 @@
#!/usr/bin/env bash
set -euo pipefail
if [[ "$(uname)" != "Darwin" ]]; then
echo "ensure_macos_permissions.sh only supports macOS" >&2
exit 1
fi
if ! command -v swift >/dev/null 2>&1; then
echo "swift is required to check macOS screen capture permissions" >&2
exit 1
fi
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PERM_SWIFT="$SCRIPT_DIR/macos_permissions.swift"
MODULE_CACHE="${TMPDIR:-/tmp}/codex-swift-module-cache"
mkdir -p "$MODULE_CACHE"
screen_capture_status() {
local json
json="$(swift -module-cache-path "$MODULE_CACHE" "$PERM_SWIFT" "$@")"
python3 -c 'import json, sys; data=json.loads(sys.argv[1]); print("1" if data.get("screenCapture") else "0")' "$json"
}
if [[ -n "${CODEX_SANDBOX:-}" ]]; then
echo "Screen capture checks are blocked in the sandbox; rerun with escalated permissions." >&2
exit 3
fi
if [[ "$(screen_capture_status)" == "1" ]]; then
echo "Screen Recording permission already granted."
exit 0
fi
cat <<'MSG'
This workflow needs macOS Screen Recording permission to capture screenshots.
macOS will show a single system prompt for Screen Recording. Approve it, then
return here. If macOS opens System Settings instead of prompting, enable Screen
Recording for your terminal and rerun the command.
MSG
# Request permission once after explaining why it is needed.
screen_capture_status --request >/dev/null || true
if [[ "$(screen_capture_status)" != "1" ]]; then
cat <<'MSG'
Screen Recording is still not granted.
Open System Settings > Privacy & Security > Screen Recording and enable it for
your terminal (and Codex if needed), then rerun your screenshot command.
MSG
exit 2
fi
echo "Screen Recording permission granted."

View File

@@ -0,0 +1,22 @@
import AppKit
import Foundation
struct Response: Encodable {
let count: Int
let displays: [Int]
}
let count = max(NSScreen.screens.count, 1)
let displays = Array(1...count)
let response = Response(count: count, displays: displays)
let encoder = JSONEncoder()
encoder.outputFormatting = [.sortedKeys]
if let data = try? encoder.encode(response),
let json = String(data: data, encoding: .utf8) {
print(json)
} else {
fputs("{\"count\":\(count)}\n", stderr)
exit(1)
}

View File

@@ -0,0 +1,40 @@
import CoreGraphics
import Foundation
struct Status: Encodable {
let screenCapture: Bool
let requested: Bool
}
let shouldRequest = CommandLine.arguments.contains("--request")
@available(macOS 10.15, *)
func screenCaptureGranted(request: Bool) -> Bool {
if CGPreflightScreenCaptureAccess() {
return true
}
if request {
_ = CGRequestScreenCaptureAccess()
return CGPreflightScreenCaptureAccess()
}
return false
}
let granted: Bool
if #available(macOS 10.15, *) {
granted = screenCaptureGranted(request: shouldRequest)
} else {
granted = true
}
let status = Status(screenCapture: granted, requested: shouldRequest)
let encoder = JSONEncoder()
encoder.outputFormatting = [.sortedKeys]
if let data = try? encoder.encode(status),
let json = String(data: data, encoding: .utf8) {
print(json)
} else {
fputs("{\"requested\":\(shouldRequest),\"screenCapture\":\(granted)}\n", stderr)
exit(1)
}

View File

@@ -0,0 +1,126 @@
import AppKit
import CoreGraphics
import Foundation
struct Bounds: Encodable {
let x: Int
let y: Int
let width: Int
let height: Int
}
struct WindowInfo: Encodable {
let id: Int
let owner: String
let name: String
let layer: Int
let bounds: Bounds
let area: Int
}
struct Response: Encodable {
let count: Int
let selected: WindowInfo?
let windows: [WindowInfo]?
}
func value(for flag: String) -> String? {
guard let idx = CommandLine.arguments.firstIndex(of: flag) else {
return nil
}
let next = CommandLine.arguments.index(after: idx)
guard next < CommandLine.arguments.endIndex else {
return nil
}
return CommandLine.arguments[next]
}
let frontmostFlag = CommandLine.arguments.contains("--frontmost")
let explicitApp = value(for: "--app")
let frontmostName = frontmostFlag ? NSWorkspace.shared.frontmostApplication?.localizedName : nil
if frontmostFlag && frontmostName == nil {
fputs("{\"count\":0}\n", stderr)
exit(1)
}
let appFilter = (explicitApp ?? frontmostName)?.lowercased()
let nameFilter = value(for: "--window-name")?.lowercased()
let includeList = CommandLine.arguments.contains("--list")
let options: CGWindowListOption = [.optionOnScreenOnly, .excludeDesktopElements]
guard let raw = CGWindowListCopyWindowInfo(options, kCGNullWindowID) as? [[String: Any]] else {
fputs("{\"count\":0}\n", stderr)
exit(1)
}
var exactMatches: [WindowInfo] = []
var partialMatches: [WindowInfo] = []
exactMatches.reserveCapacity(raw.count)
partialMatches.reserveCapacity(raw.count)
for entry in raw {
guard let owner = entry[kCGWindowOwnerName as String] as? String else { continue }
let ownerLower = owner.lowercased()
if let appFilter, !ownerLower.contains(appFilter) { continue }
let name = (entry[kCGWindowName as String] as? String) ?? ""
if let nameFilter, !name.lowercased().contains(nameFilter) { continue }
guard let number = entry[kCGWindowNumber as String] as? Int else { continue }
let layer = (entry[kCGWindowLayer as String] as? Int) ?? 0
guard let boundsDict = entry[kCGWindowBounds as String] as? [String: Any] else { continue }
let x = Int((boundsDict["X"] as? Double) ?? 0)
let y = Int((boundsDict["Y"] as? Double) ?? 0)
let width = Int((boundsDict["Width"] as? Double) ?? 0)
let height = Int((boundsDict["Height"] as? Double) ?? 0)
if width <= 0 || height <= 0 { continue }
let bounds = Bounds(x: x, y: y, width: width, height: height)
let area = width * height
let info = WindowInfo(id: number, owner: owner, name: name, layer: layer, bounds: bounds, area: area)
if let appFilter, ownerLower == appFilter {
exactMatches.append(info)
} else {
partialMatches.append(info)
}
}
let windows: [WindowInfo]
if appFilter != nil && !exactMatches.isEmpty {
windows = exactMatches
} else {
windows = partialMatches
}
func rank(_ window: WindowInfo) -> (Int, Int) {
// Prefer normal-layer windows, then larger area.
let layerScore = window.layer == 0 ? 0 : 1
return (layerScore, -window.area)
}
let ordered: [WindowInfo]
if frontmostFlag {
ordered = windows
} else {
ordered = windows.sorted { rank($0) < rank($1) }
}
let selected = ordered.first
let list: [WindowInfo]?
if includeList {
list = ordered
} else {
list = nil
}
let response = Response(count: windows.count, selected: selected, windows: list)
let encoder = JSONEncoder()
encoder.outputFormatting = [.sortedKeys]
if let data = try? encoder.encode(response),
let json = String(data: data, encoding: .utf8) {
print(json)
} else {
fputs("{\"count\":\(windows.count)}\n", stderr)
exit(1)
}

View File

@@ -0,0 +1,163 @@
param(
[string]$Path,
[ValidateSet("default", "temp")][string]$Mode = "default",
[string]$Format = "png",
[string]$Region,
[switch]$ActiveWindow,
[int]$WindowHandle
)
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"
function Get-Timestamp {
Get-Date -Format "yyyy-MM-dd_HH-mm-ss"
}
function Get-DefaultDirectory {
$home = [Environment]::GetFolderPath("UserProfile")
$pictures = Join-Path $home "Pictures"
$screenshots = Join-Path $pictures "Screenshots"
if (Test-Path $screenshots) { return $screenshots }
if (Test-Path $pictures) { return $pictures }
return $home
}
function New-DefaultFilename {
param([string]$Prefix)
if (-not $Prefix) { $Prefix = "screenshot" }
"$Prefix-$(Get-Timestamp).$Format"
}
function Resolve-OutputPath {
if ($Path) {
$expanded = [Environment]::ExpandEnvironmentVariables($Path)
$homeDir = [Environment]::GetFolderPath("UserProfile")
if ($expanded -eq "~") {
$expanded = $homeDir
} elseif ($expanded.StartsWith("~/") -or $expanded.StartsWith("~\\")) {
$expanded = Join-Path $homeDir $expanded.Substring(2)
}
$full = [System.IO.Path]::GetFullPath($expanded)
if ((Test-Path $full) -and (Get-Item $full).PSIsContainer) {
$full = Join-Path $full (New-DefaultFilename "")
} elseif (($expanded.EndsWith("\") -or $expanded.EndsWith("/")) -and -not (Test-Path $full)) {
New-Item -ItemType Directory -Path $full -Force | Out-Null
$full = Join-Path $full (New-DefaultFilename "")
} elseif ([System.IO.Path]::GetExtension($full) -eq "") {
$full = "$full.$Format"
}
$parent = Split-Path -Parent $full
if ($parent) {
New-Item -ItemType Directory -Path $parent -Force | Out-Null
}
return $full
}
if ($Mode -eq "temp") {
$tmp = [System.IO.Path]::GetTempPath()
return Join-Path $tmp (New-DefaultFilename "codex-shot")
}
$dest = Get-DefaultDirectory
return Join-Path $dest (New-DefaultFilename "")
}
function Parse-Region {
if (-not $Region) { return $null }
$parts = $Region.Split(",") | ForEach-Object { $_.Trim() }
if ($parts.Length -ne 4) {
throw "Region must be x,y,w,h"
}
$values = $parts | ForEach-Object {
$out = 0
if (-not [int]::TryParse($_, [ref]$out)) {
throw "Region values must be integers"
}
$out
}
if ($values[2] -le 0 -or $values[3] -le 0) {
throw "Region width and height must be positive"
}
return $values
}
if ($Region -and $ActiveWindow) {
throw "Choose either -Region or -ActiveWindow"
}
if ($Region -and $WindowHandle) {
throw "Choose either -Region or -WindowHandle"
}
if ($ActiveWindow -and $WindowHandle) {
throw "Choose either -ActiveWindow or -WindowHandle"
}
$regionValues = Parse-Region
$outputPath = Resolve-OutputPath
Add-Type -AssemblyName System.Windows.Forms
Add-Type -AssemblyName System.Drawing
$imageFormat = switch ($Format.ToLowerInvariant()) {
"png" { [System.Drawing.Imaging.ImageFormat]::Png }
"jpg" { [System.Drawing.Imaging.ImageFormat]::Jpeg }
"jpeg" { [System.Drawing.Imaging.ImageFormat]::Jpeg }
"bmp" { [System.Drawing.Imaging.ImageFormat]::Bmp }
default { throw "Unsupported format: $Format" }
}
Add-Type @"
using System;
using System.Runtime.InteropServices;
public static class NativeMethods {
[StructLayout(LayoutKind.Sequential)]
public struct RECT {
public int Left;
public int Top;
public int Right;
public int Bottom;
}
[DllImport("user32.dll")]
public static extern IntPtr GetForegroundWindow();
[DllImport("user32.dll")]
public static extern bool GetWindowRect(IntPtr hWnd, out RECT rect);
}
"@
if ($regionValues) {
$x = $regionValues[0]
$y = $regionValues[1]
$w = $regionValues[2]
$h = $regionValues[3]
$bounds = New-Object System.Drawing.Rectangle($x, $y, $w, $h)
} elseif ($ActiveWindow -or $WindowHandle) {
$handle = if ($WindowHandle) { [IntPtr]$WindowHandle } else { [NativeMethods]::GetForegroundWindow() }
$rect = New-Object NativeMethods+RECT
if (-not [NativeMethods]::GetWindowRect($handle, [ref]$rect)) {
throw "Failed to get window bounds"
}
$width = $rect.Right - $rect.Left
$height = $rect.Bottom - $rect.Top
$bounds = New-Object System.Drawing.Rectangle($rect.Left, $rect.Top, $width, $height)
} else {
$vs = [System.Windows.Forms.SystemInformation]::VirtualScreen
$bounds = New-Object System.Drawing.Rectangle($vs.Left, $vs.Top, $vs.Width, $vs.Height)
}
$bitmap = New-Object System.Drawing.Bitmap($bounds.Width, $bounds.Height)
$graphics = [System.Drawing.Graphics]::FromImage($bitmap)
try {
$source = New-Object System.Drawing.Point($bounds.Left, $bounds.Top)
$target = [System.Drawing.Point]::Empty
$size = New-Object System.Drawing.Size($bounds.Width, $bounds.Height)
$graphics.CopyFromScreen($source, $target, $size)
$bitmap.Save($outputPath, $imageFormat)
} finally {
$graphics.Dispose()
$bitmap.Dispose()
}
Write-Output $outputPath

View File

@@ -0,0 +1,585 @@
#!/usr/bin/env python3
"""Cross-platform screenshot helper for Codex skills."""
from __future__ import annotations
import argparse
import datetime as dt
import json
import os
import platform
import shutil
import subprocess
import tempfile
from pathlib import Path
SCRIPT_DIR = Path(__file__).resolve().parent
MAC_PERM_SCRIPT = SCRIPT_DIR / "macos_permissions.swift"
MAC_PERM_HELPER = SCRIPT_DIR / "ensure_macos_permissions.sh"
MAC_WINDOW_SCRIPT = SCRIPT_DIR / "macos_window_info.swift"
MAC_DISPLAY_SCRIPT = SCRIPT_DIR / "macos_display_info.swift"
TEST_MODE_ENV = "CODEX_SCREENSHOT_TEST_MODE"
TEST_PLATFORM_ENV = "CODEX_SCREENSHOT_TEST_PLATFORM"
TEST_WINDOWS_ENV = "CODEX_SCREENSHOT_TEST_WINDOWS"
TEST_DISPLAYS_ENV = "CODEX_SCREENSHOT_TEST_DISPLAYS"
TEST_PNG = (
b"\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\x01\x00\x00\x00\x01"
b"\x08\x06\x00\x00\x00\x1f\x15\xc4\x89\x00\x00\x00\x0cIDAT\x08\xd7c"
b"\xf8\xff\xff?\x00\x05\xfe\x02\xfeA\xad\x1c\x1c\x00\x00\x00\x00IEND"
b"\xaeB`\x82"
)
def parse_region(value: str) -> tuple[int, int, int, int]:
parts = [p.strip() for p in value.split(",")]
if len(parts) != 4:
raise argparse.ArgumentTypeError("region must be x,y,w,h")
try:
x, y, w, h = (int(p) for p in parts)
except ValueError as exc:
raise argparse.ArgumentTypeError("region values must be integers") from exc
if w <= 0 or h <= 0:
raise argparse.ArgumentTypeError("region width and height must be positive")
return x, y, w, h
def test_mode_enabled() -> bool:
value = os.environ.get(TEST_MODE_ENV, "")
return value.lower() in {"1", "true", "yes", "on"}
def normalize_platform(value: str) -> str:
lowered = value.strip().lower()
if lowered in {"darwin", "mac", "macos", "osx"}:
return "Darwin"
if lowered in {"linux", "ubuntu"}:
return "Linux"
if lowered in {"windows", "win"}:
return "Windows"
return value
def test_platform_override() -> str | None:
value = os.environ.get(TEST_PLATFORM_ENV)
if value:
return normalize_platform(value)
return None
def parse_int_list(value: str) -> list[int]:
results: list[int] = []
for part in value.split(","):
part = part.strip()
if not part:
continue
try:
results.append(int(part))
except ValueError:
continue
return results
def test_window_ids() -> list[int]:
value = os.environ.get(TEST_WINDOWS_ENV, "101,102")
ids = parse_int_list(value)
return ids or [101]
def test_display_ids() -> list[int]:
value = os.environ.get(TEST_DISPLAYS_ENV, "1,2")
ids = parse_int_list(value)
return ids or [1]
def write_test_png(path: Path) -> None:
ensure_parent(path)
path.write_bytes(TEST_PNG)
def timestamp() -> str:
return dt.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
def default_filename(fmt: str, prefix: str = "screenshot") -> str:
return f"{prefix}-{timestamp()}.{fmt}"
def mac_default_dir() -> Path:
desktop = Path.home() / "Desktop"
try:
proc = subprocess.run(
["defaults", "read", "com.apple.screencapture", "location"],
check=False,
capture_output=True,
text=True,
)
location = proc.stdout.strip()
if location:
return Path(location).expanduser()
except OSError:
pass
return desktop
def default_dir(system: str) -> Path:
home = Path.home()
if system == "Darwin":
return mac_default_dir()
if system == "Windows":
pictures = home / "Pictures"
screenshots = pictures / "Screenshots"
if screenshots.exists():
return screenshots
if pictures.exists():
return pictures
return home
pictures = home / "Pictures"
screenshots = pictures / "Screenshots"
if screenshots.exists():
return screenshots
if pictures.exists():
return pictures
return home
def ensure_parent(path: Path) -> None:
try:
path.parent.mkdir(parents=True, exist_ok=True)
except OSError:
# Fall back to letting the capture command report a clearer error.
pass
def resolve_output_path(
requested_path: str | None, mode: str, fmt: str, system: str
) -> Path:
if requested_path:
path = Path(requested_path).expanduser()
if path.exists() and path.is_dir():
path = path / default_filename(fmt)
elif requested_path.endswith(("/", "\\")) and not path.exists():
path.mkdir(parents=True, exist_ok=True)
path = path / default_filename(fmt)
elif path.suffix == "":
path = path.with_suffix(f".{fmt}")
ensure_parent(path)
return path
if mode == "temp":
tmp_dir = Path(tempfile.gettempdir())
tmp_path = tmp_dir / default_filename(fmt, prefix="codex-shot")
ensure_parent(tmp_path)
return tmp_path
dest_dir = default_dir(system)
dest_path = dest_dir / default_filename(fmt)
ensure_parent(dest_path)
return dest_path
def multi_output_paths(base: Path, suffixes: list[str]) -> list[Path]:
if len(suffixes) <= 1:
return [base]
paths: list[Path] = []
for suffix in suffixes:
candidate = base.with_name(f"{base.stem}-{suffix}{base.suffix}")
ensure_parent(candidate)
paths.append(candidate)
return paths
def run(cmd: list[str]) -> None:
try:
subprocess.run(cmd, check=True)
except FileNotFoundError as exc:
raise SystemExit(f"required command not found: {cmd[0]}") from exc
except subprocess.CalledProcessError as exc:
raise SystemExit(f"command failed ({exc.returncode}): {' '.join(cmd)}") from exc
def swift_json(script: Path, extra_args: list[str] | None = None) -> dict:
module_cache = Path(tempfile.gettempdir()) / "codex-swift-module-cache"
module_cache.mkdir(parents=True, exist_ok=True)
cmd = ["swift", "-module-cache-path", str(module_cache), str(script)]
if extra_args:
cmd.extend(extra_args)
try:
proc = subprocess.run(cmd, check=True, capture_output=True, text=True)
except FileNotFoundError as exc:
raise SystemExit("swift not found; install Xcode command line tools") from exc
except subprocess.CalledProcessError as exc:
stderr = (exc.stderr or "").strip()
if "ModuleCache" in stderr and "Operation not permitted" in stderr:
raise SystemExit(
"swift needs module-cache access; rerun with escalated permissions"
) from exc
msg = stderr or (exc.stdout or "").strip() or "swift helper failed"
raise SystemExit(msg) from exc
try:
return json.loads(proc.stdout)
except json.JSONDecodeError as exc:
raise SystemExit(f"swift helper returned invalid JSON: {proc.stdout.strip()}") from exc
def macos_screen_capture_granted(request: bool = False) -> bool:
args = ["--request"] if request else []
payload = swift_json(MAC_PERM_SCRIPT, args)
return bool(payload.get("screenCapture"))
def ensure_macos_permissions() -> None:
if os.environ.get("CODEX_SANDBOX"):
raise SystemExit(
"screen capture checks are blocked in the sandbox; rerun with escalated permissions"
)
if macos_screen_capture_granted():
return
subprocess.run(["bash", str(MAC_PERM_HELPER)], check=False)
if not macos_screen_capture_granted():
raise SystemExit(
"Screen Recording permission is required; enable it in System Settings and retry"
)
def activate_app(app: str) -> None:
safe_app = app.replace('"', '\\"')
script = f'tell application "{safe_app}" to activate'
subprocess.run(["osascript", "-e", script], check=False, capture_output=True, text=True)
def macos_window_payload(args: argparse.Namespace, frontmost: bool, include_list: bool) -> dict:
flags: list[str] = []
if frontmost:
flags.append("--frontmost")
if args.app:
flags.extend(["--app", args.app])
if args.window_name:
flags.extend(["--window-name", args.window_name])
if include_list:
flags.append("--list")
return swift_json(MAC_WINDOW_SCRIPT, flags)
def macos_display_indexes() -> list[int]:
payload = swift_json(MAC_DISPLAY_SCRIPT)
displays = payload.get("displays") or []
indexes: list[int] = []
for item in displays:
try:
value = int(item)
except (TypeError, ValueError):
continue
if value > 0:
indexes.append(value)
return indexes or [1]
def macos_window_ids(args: argparse.Namespace, capture_all: bool) -> list[int]:
payload = macos_window_payload(
args,
frontmost=args.active_window,
include_list=capture_all,
)
if capture_all:
windows = payload.get("windows") or []
ids: list[int] = []
for item in windows:
win_id = item.get("id")
if win_id is None:
continue
try:
ids.append(int(win_id))
except (TypeError, ValueError):
continue
if ids:
return ids
selected = payload.get("selected") or {}
win_id = selected.get("id")
if win_id is not None:
try:
return [int(win_id)]
except (TypeError, ValueError):
pass
raise SystemExit("no matching macOS window found; try --list-windows to inspect ids")
def list_macos_windows(args: argparse.Namespace) -> None:
payload = macos_window_payload(args, frontmost=args.active_window, include_list=True)
windows = payload.get("windows") or []
if not windows:
print("no matching windows found")
return
for item in windows:
bounds = item.get("bounds") or {}
name = item.get("name") or ""
width = bounds.get("width", 0)
height = bounds.get("height", 0)
x = bounds.get("x", 0)
y = bounds.get("y", 0)
print(f"{item.get('id')}\t{item.get('owner')}\t{name}\t{width}x{height}+{x}+{y}")
def list_test_macos_windows(args: argparse.Namespace) -> None:
owner = args.app or "TestApp"
name = args.window_name or ""
ids = test_window_ids()
if args.active_window and ids:
ids = [ids[0]]
for idx, win_id in enumerate(ids, start=1):
window_name = name or f"Window {idx}"
print(f"{win_id}\t{owner}\t{window_name}\t800x600+0+0")
def resolve_macos_windows(args: argparse.Namespace) -> list[int]:
if args.app:
activate_app(args.app)
capture_all = not args.active_window
return macos_window_ids(args, capture_all=capture_all)
def resolve_test_macos_windows(args: argparse.Namespace) -> list[int]:
ids = test_window_ids()
if args.active_window and ids:
return [ids[0]]
return ids
def capture_macos(
args: argparse.Namespace,
output: Path,
*,
window_id: int | None = None,
display: int | None = None,
) -> None:
cmd = ["screencapture", "-x", f"-t{args.format}"]
if args.interactive:
cmd.append("-i")
if display is not None:
cmd.append(f"-D{display}")
effective_window_id = window_id if window_id is not None else args.window_id
if effective_window_id is not None:
cmd.append(f"-l{effective_window_id}")
elif args.region is not None:
x, y, w, h = args.region
cmd.append(f"-R{x},{y},{w},{h}")
cmd.append(str(output))
run(cmd)
def capture_linux(args: argparse.Namespace, output: Path) -> None:
scrot = shutil.which("scrot")
gnome = shutil.which("gnome-screenshot")
imagemagick = shutil.which("import")
xdotool = shutil.which("xdotool")
if args.region is not None:
x, y, w, h = args.region
if scrot:
run(["scrot", "-a", f"{x},{y},{w},{h}", str(output)])
return
if imagemagick:
geometry = f"{w}x{h}+{x}+{y}"
run(["import", "-window", "root", "-crop", geometry, str(output)])
return
raise SystemExit("region capture requires scrot or ImageMagick (import)")
if args.window_id is not None:
if imagemagick:
run(["import", "-window", str(args.window_id), str(output)])
return
raise SystemExit("window-id capture requires ImageMagick (import)")
if args.active_window:
if scrot:
run(["scrot", "-u", str(output)])
return
if gnome:
run(["gnome-screenshot", "-w", "-f", str(output)])
return
if imagemagick and xdotool:
win_id = (
subprocess.check_output(["xdotool", "getactivewindow"], text=True)
.strip()
)
run(["import", "-window", win_id, str(output)])
return
raise SystemExit("active-window capture requires scrot, gnome-screenshot, or import+xdotool")
if scrot:
run(["scrot", str(output)])
return
if gnome:
run(["gnome-screenshot", "-f", str(output)])
return
if imagemagick:
run(["import", "-window", "root", str(output)])
return
raise SystemExit("no supported screenshot tool found (scrot, gnome-screenshot, or import)")
def main() -> None:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--path",
help="output file path or directory; overrides --mode",
)
parser.add_argument(
"--mode",
choices=("default", "temp"),
default="default",
help="default saves to the OS screenshot location; temp saves to the temp dir",
)
parser.add_argument(
"--format",
default="png",
help="image format/extension (default: png)",
)
parser.add_argument(
"--app",
help="macOS only: capture all matching on-screen windows for this app name",
)
parser.add_argument(
"--window-name",
help="macOS only: substring match for a window title (optionally scoped by --app)",
)
parser.add_argument(
"--list-windows",
action="store_true",
help="macOS only: list matching window ids instead of capturing",
)
parser.add_argument(
"--region",
type=parse_region,
help="capture region as x,y,w,h (pixel coordinates)",
)
parser.add_argument(
"--window-id",
type=int,
help="capture a specific window id when supported",
)
parser.add_argument(
"--active-window",
action="store_true",
help="capture the focused/active window only when supported",
)
parser.add_argument(
"--interactive",
action="store_true",
help="use interactive selection where the OS tool supports it",
)
args = parser.parse_args()
if args.region and args.window_id is not None:
raise SystemExit("choose either --region or --window-id, not both")
if args.region and args.active_window:
raise SystemExit("choose either --region or --active-window, not both")
if args.window_id is not None and args.active_window:
raise SystemExit("choose either --window-id or --active-window, not both")
if args.app and args.window_id is not None:
raise SystemExit("choose either --app or --window-id, not both")
if args.region and args.app:
raise SystemExit("choose either --region or --app, not both")
if args.region and args.window_name:
raise SystemExit("choose either --region or --window-name, not both")
if args.interactive and args.app:
raise SystemExit("choose either --interactive or --app, not both")
if args.interactive and args.window_name:
raise SystemExit("choose either --interactive or --window-name, not both")
if args.interactive and args.window_id is not None:
raise SystemExit("choose either --interactive or --window-id, not both")
if args.interactive and args.active_window:
raise SystemExit("choose either --interactive or --active-window, not both")
if args.list_windows and (args.region or args.window_id is not None or args.interactive):
raise SystemExit("--list-windows only supports --app, --window-name, and --active-window")
test_mode = test_mode_enabled()
system = platform.system()
if test_mode:
override = test_platform_override()
if override:
system = override
window_ids: list[int] = []
display_ids: list[int] = []
if system != "Darwin" and (args.app or args.window_name or args.list_windows):
raise SystemExit("--app/--window-name/--list-windows are supported on macOS only")
if system == "Darwin":
if test_mode:
if args.list_windows:
list_test_macos_windows(args)
return
if args.window_id is not None:
window_ids = [args.window_id]
elif args.app or args.window_name or args.active_window:
window_ids = resolve_test_macos_windows(args)
elif args.region is None and not args.interactive:
display_ids = test_display_ids()
else:
ensure_macos_permissions()
if args.list_windows:
list_macos_windows(args)
return
if args.window_id is not None:
window_ids = [args.window_id]
elif args.app or args.window_name or args.active_window:
window_ids = resolve_macos_windows(args)
elif args.region is None and not args.interactive:
display_ids = macos_display_indexes()
output = resolve_output_path(args.path, args.mode, args.format, system)
if test_mode:
if system == "Darwin":
if window_ids:
suffixes = [f"w{wid}" for wid in window_ids]
paths = multi_output_paths(output, suffixes)
for path in paths:
write_test_png(path)
for path in paths:
print(path)
return
if len(display_ids) > 1:
suffixes = [f"d{did}" for did in display_ids]
paths = multi_output_paths(output, suffixes)
for path in paths:
write_test_png(path)
for path in paths:
print(path)
return
write_test_png(output)
print(output)
return
if system == "Darwin":
if window_ids:
suffixes = [f"w{wid}" for wid in window_ids]
paths = multi_output_paths(output, suffixes)
for wid, path in zip(window_ids, paths):
capture_macos(args, path, window_id=wid)
for path in paths:
print(path)
return
if len(display_ids) > 1:
suffixes = [f"d{did}" for did in display_ids]
paths = multi_output_paths(output, suffixes)
for did, path in zip(display_ids, paths):
capture_macos(args, path, display=did)
for path in paths:
print(path)
return
capture_macos(args, output)
elif system == "Linux":
capture_linux(args, output)
elif system == "Windows":
raise SystemExit(
"Windows support lives in scripts/take_screenshot.ps1; run it with PowerShell"
)
else:
raise SystemExit(f"unsupported platform: {system}")
print(output)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf of
any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don\'t include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -0,0 +1,86 @@
---
name: "security-best-practices"
description: "Perform language and framework specific security best-practice reviews and suggest improvements. Trigger only when the user explicitly requests security best practices guidance, a security review/report, or secure-by-default coding help. Trigger only for supported languages (python, javascript/typescript, go). Do not trigger for general code review, debugging, or non-security tasks."
---
# Security Best Practices
## Overview
This skill provides a description of how to identify the language and frameworks used by the current context, and then to load information from this skill's references directory about the security best practices for this language and or frameworks.
This information, if present, can be used to write new secure by default code, or to passively detect major issues within existing code, or (if requested by the user) provide a vulnerability report and suggest fixes.
## Workflow
The initial step for this skill is to identify ALL languages and ALL frameworks which you are being asked to use or already exist in the scope of the project you are working in. Focus on the primary core frameworks. Often you will want to identify both frontend and backend languages and frameworks.
Then check this skill's references directory to see if there are any relevant documentation for the language and or frameworks. Make sure you read ALL reference files which relate to the specific framework or language. The format of the filenames is `<language>-<framework>-<stack>-security.md`. You should also check if there is a `<language>-general-<stack>-security.md` which is agnostic to the framework you may be using.
If working on a web application which includes a frontend and a backend, make sure you have checked for reference documents for BOTH the frontend and backend!
If you are asked to make a web app which will include both a frontend and backend, but the frontend framework is not specified, also check out `javascript-general-web-frontend-security.md`. It is important that you understand how to secure both the frontend and backend.
If no relevant information is available in the skill's references directory, think a little bit about what you know about the language, the framework, and all well known security best practices for it. If you are unsure you can try to search online for documentation on security best practices.
From there it can operate in a few ways.
1. The primary mode is to just use the information to write secure by default code from this point forward. This is useful for starting a new project or when writing new code.
2. The secondary mode is to passively detect vulnerabilities while working in the project and writing code for the user. Critical or very important vulnerabilities or major issues going against security guidance can be flagged and the user can be told about them. This passive mode should focus on the largest impact vulnerabilities and secure defaults.
3. The user can ask for a security report or to improve the security of the codebase. In this case a full report should be produced describe anyways the project fails to follow security best practices guidance. The report should be prioritized and have clear sections of severity and urgency. Then offer to start working on fixes for these issues. See #fixes below.
## Workflow Decision Tree
- If the language/framework is unclear, inspect the repo to determine it and list your evidence.
- If matching guidance exists in `references/`, load only the relevant files and follow their instructions.
- If no matching guidance exists, consider if you know any well known security best practices for the chosen language and or frameworks, but if asked to generate a report, let the user know that concrete guidance is not available (you can still generate the report or detect for sure critical vulnerabilities)
# Overrides
While these references contain the security best practices for languages and frameworks, customers may have cases where they need to bypass or override these practices. Pay attention to specific rules and instructions in the project's documentation and prompt files which may require you to override certain best practices. When overriding a best practice, you MAY report it to the user, but do not fight with them. If a security best practice needs to be bypassed / ignored for some project specific reason, you can also suggest to add documentation about this to the project so it is clear why the best practice is not being followed and to follow that bypass in the future.
# Report Format
When producing a report, you should write the report as a markdown file in `security_best_practices_report.md` or some other location if provided by the user. You can ask the user where they would like the report to be written to.
The report should have a short executive summary at the top.
The report should be clearly delineated into multiple sections based on severity of the vulnerability. The report should focus on the most critical findings as these have the highest impact for the user. All findings should be noted with an numeric ID to make them easier to reference.
For critical findings include a one sentence impact statement.
Once the report is written, also report it to the user directly, although you may be less verbose. You can offer to explain any of the findings or the reasons behind the security best practices guidance if the user wants more info on any findings.
Important: When referencing code in the report, make sure to find and include line numbers for the code you are referencing.
After you write the report file, summarize the findings to the user.
Also tell the user where the final report was written to
# Fixes
If you produced a report, let the user read the report and ask to begin performing fixes.
If you passively found a critical finding, notify the user and ask if they would like you to fix this finding.
When producing fixes, focus on fixing a single finding at a time. The fixes should have concise clear comments explaining that the new code is based on the specific security best practice, and perhaps a very short reason why it would be dangerous to not do it in this way.
Always consider if the changes you want to make will impact the functionality of the user's code. Consider if the changes may cause regressions with how the project works currently. It is often the case that insecure code is relied on for other reasons (and this is why insecure code lives on for so long). Avoid breaking the user's project as this may make them not want to apply security fixes in the future. It is better to write a well thought out, well informed by the rest of the project, fix, then a quick slapdash change.
Always follow any normal change or commit flow the user has configured. If making git commits, provide clear commit messages explaining this is to align with security best practices. Try to avoid bunching a number of unrelated findings into a single commit.
Always follow any normal testing flows the user has configured (if any) to confirm that your changes are not introducing regressions. Consider the second order impacts the changes may have and inform the user before making them if there are any.
# General Security Advice
Below is a few bits of secure coding advice that applies to almost any language or framework.
### Avoid Using Incrementing IDs for Public IDs of Resources
When assigning an ID for some resource, which will then be used by exposed to the internet, avoid using small auto-incrementing IDs. Use longer, random UUID4 or random hex string instead. This will prevent users from learning the quantity of a resource and being able to guess resource IDs.
### A note on TLS
While TLS is important for production deployments, most development work will be with TLS disabled or provided by some out-of-scope TLS proxy. Due to this, be very careful about not reporting lack of TLS as a security issue. Also be very careful around use of "secure" cookies. They should only be set if the application will actually be over TLS. If they are set on non-TLS applications (such as when deployed for local dev or testing), it will break the application. You can provide a env or other flag to override setting secure as a way to keep it off until on a TLS production deployment. Additionally avoid recommending HSTS. It is dangerous to use without full understanding of the lasting impacts (can cause major outages and user lockout) and it is not generally recommended for the scope of projects being reviewed by codex.

View File

@@ -0,0 +1,4 @@
interface:
display_name: "Security Best Practices"
short_description: "Security reviews and secure-by-default guidance"
default_prompt: "Review this codebase for security best practices and suggest secure-by-default improvements."

View File

@@ -0,0 +1,826 @@
# Go (Golang) Security Spec (Go 1.25.x, Standard Library, net/http)
This document is designed as a **security spec** that supports:
1) **Secure-by-default code generation** for new Go code.
2) **Security review / vulnerability hunting** in existing Go code (passive “notice issues while working” and active “scan the repo and report findings”).
It is intentionally written as a set of **normative requirements** (“MUST/SHOULD/MAY”) plus **audit rules** (what bad patterns look like, how to detect them, and how to fix/mitigate them).
--------------------------------------------------------------------
## 0) Safety, boundaries, and anti-abuse constraints (MUST FOLLOW)
- MUST NOT request, output, log, or commit secrets (API keys, passwords, private keys, session cookies, JWTs, database URLs with credentials, signing keys, client secrets).
- MUST NOT “fix” security by disabling protections (e.g., `InsecureSkipVerify`, `GOSUMDB=off` for public modules, wildcard CORS + credentials, removing auth checks, disabling CSRF defenses on cookie-auth apps).
- MUST provide **evidence-based findings** during audits: cite file paths, code snippets, build/deploy configs, and concrete values that justify the claim.
- MUST treat uncertainty honestly: if a control might exist in infrastructure (reverse proxy, WAF, service mesh, platform config), report it as “not visible in app code; verify at runtime/config.”
- MUST keep fixes minimal, correct, and production-safe; avoid introducing breaking changes without warning (especially around auth/session flows, and proxies).
--------------------------------------------------------------------
## 1) Operating modes
### 1.1 Generation mode (default)
When asked to write new Go code or modify existing code:
- MUST follow every **MUST** requirement in this spec.
- SHOULD follow every **SHOULD** requirement unless the user explicitly says otherwise.
- MUST prefer safe-by-default APIs and proven libraries over custom security code.
- MUST avoid introducing new risky sinks (shell execution, dynamic template execution, serving user files as HTML, unsafe redirects, weak crypto, unbounded parsing, etc.).
### 1.2 Passive review mode (always on while editing)
While working anywhere in a Go repo (even if the user did not ask for a security scan):
- MUST “notice” violations of this spec in touched/nearby code.
- SHOULD mention issues as they come up, with a brief explanation + safe fix.
### 1.3 Active audit mode (explicit scan request)
When the user asks to “scan”, “audit”, or “hunt for vulns”:
- MUST systematically search the codebase for violations of this spec.
- MUST output findings in a structured format (see §2.3).
Recommended audit order:
1) Build/deploy entrypoints: `main.go`, `cmd/*`, Dockerfiles, Kubernetes manifests, systemd units, CI workflows.
2) Go toolchain & dependency policy: Go version, modules, `go.mod/go.sum`, proxy/sumdb settings, govulncheck usage.
3) Secret management and config loading (env, files, secret stores) + logging patterns.
4) HTTP server configuration (timeouts, body limits, proxy trust, security headers).
5) AuthN/AuthZ boundaries, session/cookie settings, token validation.
6) CSRF protections for cookie-authenticated state-changing endpoints.
7) Template usage and output encoding (XSS), and any “render template from string” behavior (SSTI).
8) File handling (uploads/downloads/path traversal/temp files), static file serving.
9) Injection sinks: SQL, OS command execution, SSRF/outbound fetch, open redirects.
10) Concurrency/resource exhaustion (unbounded goroutines/queues, missing timeouts/contexts).
11) Use of `unsafe` / `cgo` / `reflect` in security-sensitive paths.
12) Debug/diagnostic endpoints (pprof/expvar/metrics) exposure.
13) Cryptography usage (randomness, password hashing).
--------------------------------------------------------------------
## 2) Definitions and review guidance
### 2.1 Untrusted input (treat as attacker-controlled unless proven otherwise)
Examples include:
- `*http.Request` fields: `r.URL.Path`, `r.URL.RawQuery`, `r.Form`, `r.PostForm`, headers, cookies, `r.Body`
- Path parameters from routers (including values extracted from URL paths)
- JSON/XML/YAML bodies, multipart form parts, uploaded files
- Any data from external systems (webhooks, third-party APIs, message queues)
- Any persisted user content (DB rows) that originated from users
- Configuration values that might be attacker-influenced in some deployments (headers set by upstream proxies, environment variables in multi-tenant systems)
### 2.2 State-changing request
A request is state-changing if it can create/update/delete data, change auth/session state, trigger side effects (purchase, email send, webhook send), or initiate privileged actions.
### 2.3 Required audit finding format
For each issue found, output:
- Rule ID:
- Severity: Critical / High / Medium / Low
- Location: file path + function/handler name + line(s)
- Evidence: the exact code/config snippet
- Impact: what could go wrong, who can exploit it
- Fix: safe change (prefer minimal diff)
- Mitigation: defense-in-depth if immediate fix is hard
- False positive notes: what to verify if uncertain (edge configs, proxy behavior, auth assumptions)
--------------------------------------------------------------------
## 3) Secure baseline: minimum production configuration (MUST in production)
This is the smallest “production baseline” that prevents common Go misconfigurations.
### 3.1 Toolchain, patching, and dependency hygiene (MUST)
- MUST run a supported Go major version and keep to the latest patch releases.
- MUST treat Go standard library patch releases as security-relevant (many security fixes land in stdlib components like `net/http`, `crypto/*`, parsing packages).
- MUST use Go modules with committed `go.mod` and `go.sum`.
- MUST NOT disable module authenticity mechanisms for public modules (checksum DB) unless you have a controlled, documented replacement.
- MUST run `govulncheck` (source scan and/or binary scan) in CI and address findings.
### 3.2 HTTP server baseline (MUST for network-facing services)
If the program serves HTTP (directly or via a framework built on `net/http`):
- MUST configure an `http.Server` with explicit timeouts and header limits.
- MUST set request body size limits (global and per-route as needed).
- MUST avoid exposing diagnostic endpoints (pprof/expvar) publicly.
- SHOULD set a consistent set of security headers (or verify they are set at the edge).
- MUST set cookie security attributes for any cookies you issue.
- SHOULD implement rate limiting and abuse controls for auth and expensive endpoints.
Illustrative baseline skeleton (adjust to your project):
- Create a dedicated mux (avoid implicit global defaults unless intentionally managed).
- Wrap handlers with: panic-safe error handling, request ID, logging, auth, and limits.
--------------------------------------------------------------------
## 4) Rules (generation + audit)
Each rule contains: required practice, insecure patterns, detection hints, and remediation.
### GO-DEPLOY-001: Keep the Go toolchain and standard library updated (security releases)
Severity: Medium
NOTE: Upgrading dependencies and the core Go version can break projects in unexpected ways. Focus on only security-critical dependencies and if noticed, let the user know rather than upgrading automatically.
Required:
- MUST run a supported Go major release and apply patch releases promptly.
- SHOULD treat patch releases as security-relevant, even if your application code didnt change.
Insecure patterns:
- Production builds pinned to old Go versions without a patching process.
- Docker images like `golang:1.xx` or custom base images that are not updated regularly.
- CI pipelines that intentionally suppress Go updates.
Detection hints:
- Inspect CI (`.github/workflows`, `gitlab-ci.yml`, etc.) for `go-version:` or toolchain setup.
- Inspect Dockerfiles for `FROM golang:` tags.
- Inspect `go.mod` `go` directive and any toolchain pinning.
Fix:
- Upgrade to the latest patch of a supported Go version.
- Add an automated check (CI) that fails when Go is below an approved minimum.
Notes:
- Go publishes regular minor releases that frequently include security fixes across standard library packages.
---
### GO-SUPPLY-001: Go module authenticity MUST NOT be disabled for public dependencies
Severity: High
Required:
- MUST keep module checksum verification enabled for public modules.
- SHOULD commit `go.sum` and treat changes as security-sensitive.
- MUST NOT use insecure module fetching settings for public modules.
- MAY configure private module behavior using `GOPRIVATE`/`GONOSUMDB` for private repos, but must do so narrowly and intentionally.
Insecure patterns:
- `GOSUMDB=off` in CI or production build environments for public modules.
- `GONOSUMDB=*` or overly broad patterns that effectively disable verification.
- `GOINSECURE=*` or broad `GOINSECURE` patterns for public modules.
- `GOPROXY=direct` everywhere without a clear policy.
Detection hints:
- Search build configs for `GOSUMDB`, `GONOSUMDB`, `GOINSECURE`, `GOPROXY`, `GOPRIVATE`.
- Look for documentation/scripts that recommend disabling checksum DB “to make builds work”.
Fix:
- Restore defaults for public module verification.
- For private modules:
- Set `GOPRIVATE=your.private.domain/*`
- Configure an internal proxy or direct fetching, and restrict `GONOSUMDB` to private patterns only.
Notes:
- Disabling checksum verification removes an important integrity layer against targeted or compromised upstream delivery.
---
### GO-CONFIG-001: Secrets must be externalized and never logged or committed
Severity: High (Critical if credentials are committed)
Required:
- MUST load secrets from environment variables, secret managers, or secure config files with restricted permissions.
- MUST NOT hard-code secrets in Go source, test fixtures that may reach production, or build args.
- MUST NOT log secrets or full credential-bearing connection strings.
- SHOULD fail closed in production if required secrets are missing.
Insecure patterns:
- String constants containing tokens/keys/passwords.
- `.env` files or config files with secrets committed to repo.
- Logging `os.Environ()`, dumping full configs, or printing DSNs.
Detection hints:
- Search for suspicious literals (`API_KEY`, `SECRET`, `PASSWORD`, `Authorization:`).
- Inspect config loaders and logging statements.
- Inspect CI logs or debug print paths.
Fix:
- Move secrets to a secret store / environment variables.
- Redact sensitive fields in logs.
- Add secret scanning to CI and pre-commit.
---
### GO-HTTP-001: HTTP servers MUST set timeouts and MaxHeaderBytes
Severity: High (DoS risk)
Required:
- MUST set: `ReadHeaderTimeout`, and SHOULD set `ReadTimeout`, `WriteTimeout`, `IdleTimeout` as appropriate for the service.
- MUST set `MaxHeaderBytes` to a justified limit for your application.
- MUST NOT rely on default zero-values for timeouts in production for internet-facing servers.
Insecure patterns:
- `http.ListenAndServe(":8080", handler)` with a default `http.Server` (no explicit timeouts).
- `&http.Server{}` with timeouts left at zero.
- Missing `MaxHeaderBytes`.
Detection hints:
- Search for `http.ListenAndServe(`, `ListenAndServeTLS(`, `Server{` and inspect configured fields.
- Check for reverse proxies; even with a proxy, app-level timeouts still matter.
Fix:
- Use `http.Server{ReadHeaderTimeout: ..., ReadTimeout: ..., WriteTimeout: ..., IdleTimeout: ..., MaxHeaderBytes: ...}`.
- Calibrate timeouts per endpoint type (streaming vs JSON APIs).
Notes:
- Net/http documents that these timeouts exist and that zero/negative values mean “no timeout”; production services should choose explicit values.
---
### GO-HTTP-002: Request body and multipart parsing MUST be size-bounded
Severity: Medium (DoS risk; can be High for upload-heavy apps)
Required:
- MUST enforce a global maximum request body size for endpoints that accept bodies.
- MUST enforce strict multipart upload limits and avoid unbounded form parsing.
- SHOULD enforce per-route limits when some endpoints legitimately need larger bodies.
- SHOULD set upstream (proxy) limits as defense-in-depth.
Insecure patterns:
- Reading `r.Body` with `io.ReadAll(r.Body)` without a size cap.
- Calling `r.ParseMultipartForm(...)` with overly large limits (or forgetting size controls).
- Accepting file uploads with no limits on file size, number of parts, or total body size.
Detection hints:
- Search for `io.ReadAll(r.Body)`, `json.NewDecoder(r.Body)`, `ParseMultipartForm`, `FormFile`, `multipart`.
- Look for missing `http.MaxBytesReader` or equivalent per-handler limiting.
- Look for “upload” endpoints and check limits.
Fix:
- Wrap request bodies with `http.MaxBytesReader(w, r.Body, maxBytes)` before parsing.
- For multipart, set conservative limits and validate file sizes/part counts explicitly.
- Set proxy limits (e.g., at ingress) in addition to app limits.
Notes:
- There are known vulnerability classes and advisories related to excessive resource consumption in multipart/form parsing; treat unbounded parsing as a security issue.
---
### GO-DEPLOY-002: Diagnostic endpoints (pprof/expvar/metrics) MUST NOT be publicly exposed
Severity: High
NOTE: This only applies to production configurations. These endpoints are often used for debug or dev endpoints. If found, confirm that it would be reachable from the actual production deployment.
Required:
- MUST NOT expose `net/http/pprof` handlers on a public internet-facing listener without strong access controls.
- SHOULD run diagnostics on a separate, internal-only listener (loopback/VPC-only) and require auth.
- MUST review what diagnostic endpoints reveal (stack traces, memory, command lines, environment, internal URLs).
Insecure patterns:
- Side-effect import `import _ "net/http/pprof"` in a server binary with a public mux.
- `/debug/pprof/*` reachable without auth.
- `/debug/vars` (expvar) reachable without auth.
Detection hints:
- Search for `net/http/pprof` imports (including blank imports).
- Search for route prefixes `/debug/pprof`, `/debug/vars`.
- Check whether `http.DefaultServeMux` is used and whether any debug handlers register globally.
Fix:
- Remove diagnostics from production builds, or bind them to an internal-only listener.
- Add strong authentication/authorization (and ideally network-level restrictions).
Notes:
- pprof is typically imported for its side effect of registering HTTP handlers under `/debug/pprof/`.
---
### GO-HTTP-003: Reverse proxy and forwarded header trust MUST be explicit
Severity: High (auth, URL generation, logging/auditing correctness)
Required:
- If behind a reverse proxy, MUST define which proxy is trusted and how client IP/scheme/host are derived.
- MUST NOT trust `X-Forwarded-For`, `X-Forwarded-Proto`, `Forwarded`, or similar headers from the open internet.
- MUST ensure “secure cookie” logic, redirects, and absolute URL generation do not rely on spoofable headers.
Insecure patterns:
- Using `r.Header.Get("X-Forwarded-For")` as the client IP without validating the proxy boundary.
- Deriving “is HTTPS” from `X-Forwarded-Proto` without confirming it came from a trusted proxy.
- Using forwarded `Host` values for password reset links without allowlisting.
Detection hints:
- Search for `X-Forwarded-For`, `X-Forwarded-Proto`, `Forwarded`, `Real-IP`, and any custom “client IP” helpers.
- Inspect ingress/proxy configs; if not visible, mark as “verify at edge”.
Fix:
- Enforce proxy trust at the edge and in app:
- Accept forwarded headers only from known proxy IP ranges.
- Prefer platform-provided mechanisms where available.
- If generating external links, use a configured allowlisted canonical origin (not the requests Host header).
---
### GO-HTTP-004: Security headers SHOULD be set (in app or at the edge)
Severity: Medium
Required (typical web app serving browsers):
- SHOULD set:
- `Content-Security-Policy` (CSP) appropriate to the app. NOTE: It is most important to set the CSP's script-src. All other directives are not as important and can generally be excluded for the ease of development.
- `X-Content-Type-Options: nosniff`
- Clickjacking protection (`X-Frame-Options` and/or CSP `frame-ancestors`)
- `Referrer-Policy` and `Permissions-Policy` where appropriate
- MUST ensure cookies have secure attributes (see GO-HTTP-005).
NOTE:
- These headers may be set via reverse proxy/CDN; if not visible in app code, report as “verify at edge”.
Insecure patterns:
- No security headers anywhere (app or edge) for a browser-facing app.
- CSP missing for apps rendering untrusted content.
Detection hints:
- Search for middleware setting headers: `w.Header().Set("Content-Security-Policy", ...)`, etc.
- Search for reverse proxy config that sets headers.
Fix:
- Add centralized header middleware in Go, or configure at the edge.
- Keep CSP realistic; avoid `unsafe-inline` where possible.
---
### GO-HTTP-005: Cookies MUST use secure attributes in production
Severity: Medium
Required (production, HTTPS):
- MUST set `Secure` on cookies that carry auth/session state. IMPORTANT NOTE: Only set `Secure` in production environment when TLS is configured. When running in a local dev environment over HTTP, do not set `Secure` property on cookies. You should do this conditionally based on if the app is running in production mode. You should also include a property like `SESSION_COOKIE_SECURE` which can be used to disable `Secure` cookies when testing over HTTP.
- MUST set `HttpOnly` on auth/session cookies.
- SHOULD set `SameSite=Lax` by default (or `Strict` if compatible), and only use `None` when necessary (and only with `Secure`).
- SHOULD set bounded lifetimes (`Max-Age`/`Expires`) appropriate to the app.
Insecure patterns:
- Setting auth/session cookies without `Secure` in HTTPS deployments.
- Cookies without `HttpOnly` for session identifiers.
- `SameSite=None` for cookie-authenticated apps without a strong CSRF strategy.
Detection hints:
- Search for `http.SetCookie`, `&http.Cookie{`, `Set-Cookie`.
- Inspect cookie flags in auth/session code.
Fix:
- Set the correct fields on `http.Cookie` and centralize cookie creation.
Notes:
- SameSite is defense-in-depth and does not replace CSRF protections for cookie-auth apps.
---
### GO-HTTP-006: Cookie-authenticated state-changing endpoints MUST be CSRF-protected
Severity: High
- IMPORTANT NOTE: If cookies are not used for auth (e.g., pure bearer token in Authorization header with no ambient cookies), CSRF is not a risk for those endpoints.
Required:
- MUST protect all state-changing endpoints (POST/PUT/PATCH/DELETE) that rely on cookies for authentication.
- SHOULD use a well-tested CSRF library/middleware rather than rolling your own.
- MAY use additional defenses (Origin/Referer checks, Fetch Metadata, SameSite cookies), but tokens remain the primary defense for cookie-authenticated apps.
If tokens are impractical, or for small applications:
* MUST at a minimum require a custom header to be set and set the session cookie SESSION_COOKIE_SAMESITE=lax, as this is the strongest method besides requiring a form token, and may be much easier to implement.
Insecure patterns:
- Cookie-authenticated JSON endpoints that mutate state with no CSRF checks.
- Using GET for state-changing actions.
Detection hints:
- Enumerate all non-GET routes and identify auth mechanism.
- Look for CSRF middleware usage; if absent, treat as suspicious in browser-facing apps.
Fix:
- Add CSRF middleware and ensure it covers all state-changing routes.
- If the service is an API intended for non-browser clients, avoid cookie auth; use Authorization headers.
---
### GO-HTTP-007: CORS must be explicit and least-privilege
Severity: Medium (High if misconfigured with credentials)
Required:
- If CORS is not needed, MUST keep it disabled.
- If CORS is needed:
- MUST allowlist trusted origins (do not reflect arbitrary origins)
- MUST be careful with credentialed requests; do not combine broad origins with cookies
- SHOULD restrict allowed methods/headers
Insecure patterns:
- `Access-Control-Allow-Origin: *` paired with cookies (`Access-Control-Allow-Credentials: true`).
- Reflecting `Origin` without validation.
Detection hints:
- Search for `Access-Control-Allow-` header setting.
- Search for CORS middleware configuration.
Fix:
- Implement strict origin allowlists and minimal methods/headers.
- Ensure cookie-auth endpoints are not exposed cross-origin unless required.
---
### GO-XSS-001: Use html/template and avoid bypassing auto-escaping with untrusted data
Severity: High
Required:
- MUST use `html/template` for HTML rendering (not `text/template`).
- MUST NOT convert untrusted data into “trusted” template types (`template.HTML`, `template.JS`, `template.URL`, etc.).
- SHOULD keep templates static and controlled by developers; treat dynamic templates as high risk.
- MUST NOT serve user-uploaded HTML/JS as active content unless explicitly intended and safely sandboxed.
Insecure patterns:
- `text/template` used to generate HTML.
- Using `template.HTML(userInput)` or similar typed wrappers.
- Directly writing unescaped user content into HTML responses.
Detection hints:
- Search for `text/template`, `template.New(...).Parse(...)`, and typed wrappers like `template.HTML(`.
- Inspect handlers that return HTML with string concatenation.
Fix:
- Use `html/template` and pass untrusted data as data, not markup.
- If you must allow limited HTML, use a vetted HTML sanitizer and still be careful with attributes/URLs.
---
### GO-SSTI-001: Never parse/execute templates from untrusted input (SSTI)
Severity: Critical
Required:
- MUST NOT call `template.Parse` / `template.ParseFiles` / `template.New(...).Parse(...)` on template text influenced by untrusted input.
- MUST treat “user-defined templates” as a special high-risk design:
- MUST use heavy sandboxing and strict allowlists
- MUST isolate execution (process/container boundary) if truly required
Insecure patterns:
- `tmpl := template.Must(template.New("x").Parse(r.FormValue("tmpl")))`
- Reading templates from uploads / DB entries and executing them in the same trust domain as server code.
Detection hints:
- Search for `.Parse(` and trace the origin of the template string.
- Look for “custom email templates”, “user theming templates”, etc.
Fix:
- Replace with safe substitution mechanisms (no code execution).
- If templates must be user-controlled, isolate and sandbox aggressively.
---
### GO-PATH-001: Prevent path traversal and unsafe file serving
Severity: High
Required:
- MUST NOT pass user-controlled paths to `os.Open`, `os.ReadFile`, `http.ServeFile`, or `http.FileServer` without strict validation and base-dir enforcement.
- MUST treat `..`, absolute paths, and OS-specific path tricks as hostile input.
- SHOULD store user uploads outside any static web root; serve through controlled handlers.
- MUST avoid directory listing for sensitive file trees.
Insecure patterns:
- `http.ServeFile(w, r, r.URL.Query().Get("path"))`
- `os.Open(filepath.Join(baseDir, userPath))` without checking that the result stays under `baseDir`
- `http.FileServer(http.Dir("."))` serving the project root or user-writable directories
Detection hints:
- Search for `ServeFile(`, `FileServer(`, `http.Dir(`, `os.Open(`, `ReadFile(`, `filepath.Join(`.
- Trace whether path components come from request/DB.
Fix:
- Use an allowlist of file identifiers (e.g., database IDs) mapped to server-side paths.
- Enforce base directory containment after cleaning and joining.
- Serve active formats as downloads (`Content-Disposition: attachment`) unless explicitly intended.
---
### GO-UPLOAD-001: File uploads must be validated, stored safely, and served safely
Severity: High
Required:
- MUST enforce upload size limits (app + edge).
- MUST validate file type using allowlists and content checks (not only extensions).
- MUST store uploads outside executable/static roots when possible.
- SHOULD generate server-side filenames (random IDs) and avoid trusting original names.
- MUST serve potentially active formats safely (download attachment) unless explicitly intended.
Insecure patterns:
- Accepting arbitrary file types and serving them back inline.
- Using user-supplied filename as storage path.
- Missing size/type validation.
Detection hints:
- Search for `multipart`, `FormFile`, `ParseMultipartForm`, `io.Copy` to disk.
- Check where files are stored and how they are served.
Fix:
- Implement allowlist validation + safe storage + safe serving.
- Add scanning/quarantine workflows where applicable.
---
### GO-INJECT-001: Prevent SQL injection (parameterized queries / ORM)
Severity: High
Required:
- MUST use parameterized queries or an ORM that parameterizes under the hood.
- MUST NOT build SQL by string concatenation / `fmt.Sprintf` / string interpolation with untrusted input.
Insecure patterns:
- `fmt.Sprintf("SELECT ... WHERE id=%s", r.URL.Query().Get("id"))`
- `query := "UPDATE users SET role='" + role + "' WHERE id=" + id`
Detection hints:
- Grep for `SELECT`, `INSERT`, `UPDATE`, `DELETE` and check how query strings are built.
- Trace untrusted data into `db.Query`, `db.Exec`, `QueryRow`, etc.
Fix:
- Replace with placeholders (`?`, `$1`, etc.) and pass parameters separately.
- Validate and type-check IDs before use.
---
### GO-INJECT-002: Prevent OS command injection; avoid shelling out with untrusted input
Severity: Critical to High (depends on exposure)
Required:
- MUST avoid executing external commands with attacker-controlled strings.
- If subprocess is necessary:
- MUST use `exec.CommandContext` with an argument list (not `sh -c`).
- MUST NOT pass untrusted input to a shell (`bash -c`, `sh -c`, PowerShell).
- SHOULD use strict allowlists for any variable component (subcommand, flags, filenames).
- MUST assume CLI tools may interpret attacker-controlled args as flags or special values.
Insecure patterns:
- `exec.Command("sh", "-c", userString)`
- `exec.Command("bash", "-c", fmt.Sprintf("tool %s", user))`
- Calling the shell to get glob expansion for user-supplied globs.
Detection hints:
- Search for `os/exec`, `exec.Command(`, `CommandContext(`, `"sh"`, `"bash"`, `"-c"`.
- Trace untrusted input into command name/args.
Fix:
- Use library APIs instead of subprocesses.
- Hardcode command and allowlist/validate args.
- If a shell is unavoidable, escape robustly and treat as high risk (prefer avoiding).
Notes:
- The Go `os/exec` package intentionally does invoke a shell; introducing `sh -c` reintroduces shell injection hazards.
---
### GO-SSRF-001: Prevent SSRF in outbound HTTP requests
Severity: Medium (High in cloud/LAN environments)
- Note: For small stand alone projects this is less important. It is most important when deploying into an LAN or with other services listening on the same server.
Required:
- MUST treat outbound requests to user-provided URLs as high risk.
- SHOULD allowlist hosts/domains for any user-influenced URL fetch.
- SHOULD block access to localhost/private IP ranges/link-local addresses and cloud metadata endpoints.
- MUST restrict schemes to `http`/`https` (no `file:`, `gopher:`, etc.).
- MUST set client timeouts and restrict redirects.
Insecure patterns:
- `http.Get(r.URL.Query().Get("url"))`
- “URL preview” / “webhook test” endpoints that fetch arbitrary URLs.
Detection hints:
- Search for `http.Get`, `client.Do`, and URL values derived from requests/DB.
- Identify features that fetch remote resources.
Fix:
- Parse URLs strictly; enforce scheme and allowlisted hostnames.
- Resolve DNS and enforce IP-range restrictions (with care for DNS rebinding).
- Set timeouts, disable redirects unless needed, and cap response sizes.
---
### GO-HTTPCLIENT-001: Outbound HTTP clients MUST set timeouts and close bodies
Severity: High (DoS and resource exhaustion)
Required:
- MUST set an overall timeout on `http.Client` usage (or equivalent per-request deadlines via context + transport timeouts).
- MUST ensure `resp.Body.Close()` is called for all successful requests (typically `defer resp.Body.Close()` immediately after error check).
- SHOULD limit response body reads (do not `io.ReadAll` unbounded responses).
- SHOULD restrict redirects for security-sensitive fetches (SSRF, auth flows).
Insecure patterns:
- Using `http.DefaultClient` / `http.Get` for user-influenced destinations with no timeout policy.
- Missing `defer resp.Body.Close()` leading to resource leaks.
- `io.ReadAll(resp.Body)` with no limit.
Detection hints:
- Search for `http.Get(`, `http.Post(`, `client := &http.Client{}` without `Timeout`, `client.Do(` and missing closes.
- Search for `io.ReadAll(resp.Body)`.
Fix:
- Use a configured client with timeouts.
- Always close response bodies.
- Use bounded readers (`io.LimitReader`) for large/untrusted responses.
Notes:
- The net/http package exposes `DefaultClient` as a zero-valued `http.Client`, which can easily lead to “no timeout” behavior unless configured.
---
### GO-REDIRECT-001: Prevent open redirects
Severity: Medium (can be High with auth flows)
Required:
- MUST validate redirect targets derived from untrusted input (`next`, `redirect`, `return_to`).
- SHOULD prefer only same-site relative paths.
- SHOULD fall back to a safe default on validation failure.
Insecure patterns:
- `http.Redirect(w, r, r.URL.Query().Get("next"), http.StatusFound)` with no validation.
Detection hints:
- Search for `http.Redirect(` and check origin of the location.
Fix:
- Allowlist internal paths or known domains.
- Reject absolute URLs unless explicitly needed and allowlisted.
---
### GO-CRYPTO-001: Cryptographic randomness MUST come from crypto/rand
Severity: High (Critical if used for auth/session tokens or keys)
Required:
- MUST use `crypto/rand` for:
- session IDs, password reset tokens, API keys, CSRF tokens, nonces
- encryption keys, signing keys, salts when required
- MUST NOT use `math/rand` for any security-sensitive value.
- SHOULD use built-in helpers that produce appropriately strong tokens when available.
Insecure patterns:
- `math/rand.Seed(time.Now().UnixNano())` followed by token generation for auth or sessions.
- Using UUIDv4-like constructs built from `math/rand`.
Detection hints:
- Search for `math/rand`, `rand.Seed`, `rand.Intn` in code that touches auth/session/token flows.
- Search for custom token generators.
Fix:
- Switch to `crypto/rand` (`rand.Reader`, `rand.Read`, or secure token helpers).
- Ensure sufficient entropy and use URL-safe encoding.
Notes:
- The crypto/rand package provides secure randomness APIs and token generation helpers.
---
### GO-AUTH-001: Password storage MUST use adaptive hashing (bcrypt/argon2id) and safe comparisons
Severity: High
Required:
- MUST hash passwords using an adaptive password hashing function (bcrypt or argon2id).
- MUST NOT store plaintext passwords or reversible encryption of passwords.
- MUST compare secrets in constant time when relevant (tokens, MACs, API keys) to reduce timing leaks.
- SHOULD ensure password policies do not exceed algorithm constraints (e.g., bcrypt has input length limits; handle long passphrases appropriately).
Insecure patterns:
- `sha256(password)` stored as password hash.
- Plaintext password storage.
- Comparing secrets with `==` in timing-sensitive contexts.
Detection hints:
- Search for `sha1`, `sha256`, `md5` used on passwords.
- Search for `bcrypt`/`argon2` usage; if absent, suspect.
- Search for `==` comparisons on tokens/API keys.
Fix:
- Use `bcrypt.GenerateFromPassword` / `CompareHashAndPassword` or argon2id with recommended parameters.
- Use constant-time compare helpers when comparing MACs/tokens.
Notes:
- Go provides bcrypt in `golang.org/x/crypto/bcrypt`, and constant-time comparisons in `crypto/subtle`.
---
### GO-CONC-001: Data races and concurrency hazards MUST be treated as security-relevant
Severity: Medium to High (depends on what races affect)
Required:
- MUST run tests with the race detector (`go test -race`) in CI for security-sensitive services.
- MUST fix detected races; do not suppress without deep justification.
- SHOULD treat shared mutable state in handlers as high risk; enforce synchronization or avoid shared mutability.
Insecure patterns:
- Global maps/slices mutated from multiple goroutines without a mutex.
- Caches or auth/session state stored in globals without concurrency protection.
- Racy access to authorization state (can lead to bypasses or inconsistent enforcement).
Detection hints:
- Search for `var someMap = map[...]...` used in handlers.
- Look for missing `sync.Mutex`, `sync.Map`, channels, or other synchronization.
- Ensure CI includes `-race` and that it runs relevant tests.
Fix:
- Add proper synchronization or redesign to avoid shared mutable state.
- Add race tests and run them continuously.
Notes:
- The Go race detector only finds races that occur in executed code paths; improve test coverage and run realistic workloads with `-race` where feasible.
---
### GO-UNSAFE-001: Use of unsafe/cgo MUST be minimized and audited like memory-unsafe code
Severity: High (Critical in high-risk code paths)
Required:
- SHOULD avoid importing `unsafe` in application code unless absolutely necessary.
- If `unsafe` is used, MUST treat it as “manual memory safety” requiring careful review and test coverage.
- If `cgo` is used, MUST treat the C/C++ boundary as memory-unsafe; apply secure coding practices on the C side and isolate where possible.
Insecure patterns:
- Widespread `unsafe.Pointer` casts in parsing, serialization, auth, or network code.
- `cgo` used for parsing or security boundaries without sandboxing.
Detection hints:
- Search for `import "unsafe"`, `unsafe.Pointer`, `// #cgo`, `import "C"`.
- Prioritize review where unsafe touches untrusted input.
Fix:
- Replace unsafe/cgo usage with safe standard library alternatives where possible.
- Isolate unsafe code in small, well-tested modules with fuzz/race tests.
Notes:
- The unsafe package explicitly provides operations that step around Gos type safety guarantees.
--------------------------------------------------------------------
## 5) Practical scanning heuristics (how to “hunt”)
When actively scanning, use these high-signal patterns:
Toolchain & dependencies:
- `FROM golang:` (Dockerfiles), `go-version:` (CI), `toolchain go` (go.mod), pinned old versions
- `GOSUMDB=off`, `GOINSECURE`, `GONOSUMDB`, `GOPROXY=direct`
- `replace` directives in `go.mod` to forks/paths
- `govulncheck` missing in CI
HTTP server hardening:
- `http.ListenAndServe(`, `ListenAndServeTLS(`, `&http.Server{` with missing timeouts
- `ReadHeaderTimeout: 0`, `ReadTimeout: 0`, `WriteTimeout: 0`, `IdleTimeout: 0`, missing `MaxHeaderBytes`
Body parsing / DoS:
- `io.ReadAll(r.Body)`, `json.NewDecoder(r.Body)` without size cap
- `ParseMultipartForm`, `FormFile`, `multipart.NewReader` without explicit limits
- Missing `http.MaxBytesReader`
Debug exposure:
- `import _ "net/http/pprof"`
- `/debug/pprof`, `/debug/vars`
Templates / XSS / SSTI:
- `text/template` used for HTML output
- `template.HTML(`, `template.JS(`, `template.URL(` with user-controlled data
- `.Parse(` on user-controlled strings
Files:
- `http.ServeFile(` with user path
- `http.FileServer(http.Dir(` pointing at repo root or uploads
- `os.Open(filepath.Join(base, user))` without containment checks
Injection:
- SQL building with `fmt.Sprintf`, string concatenation near `db.Query/Exec`
- `exec.Command("sh","-c", ...)`, `exec.Command("bash","-c", ...)`
SSRF / outbound HTTP:
- `http.Get(userURL)`, `client.Do(req)` where URL comes from request/DB
- Missing client timeout, missing `resp.Body.Close()`, unbounded `io.ReadAll(resp.Body)`
Crypto:
- `math/rand` in token/session generation
- `InsecureSkipVerify: true`
- Password hashing with `sha256`/`md5` instead of bcrypt/argon2
Concurrency:
- Shared maps/slices mutated from handlers without locks
- CI lacking `go test -race`
Always try to confirm:
- data origin (untrusted vs trusted)
- sink type (template/SQL/subprocess/files/http)
- protective controls present (limits, validation, allowlists, middleware, network controls)
--------------------------------------------------------------------
## 6) Sources (accessed 2026-01-28)
Primary Go documentation:
- Go Security Policy — https://go.dev/doc/security/policy
- Go Release History (security fixes in patch releases) — https://go.dev/doc/devel/release
- Go 1.25 Release Notes — https://go.dev/doc/go1.25
- net/http (server timeouts, MaxHeaderBytes, DefaultClient) — https://pkg.go.dev/net/http
- html/template (auto-escaping and trusted-template assumptions) — https://pkg.go.dev/html/template
- crypto/tls (MinVersion defaults, InsecureSkipVerify warnings) — https://pkg.go.dev/crypto/tls
- crypto/rand (secure randomness, token helpers) — https://pkg.go.dev/crypto/rand
- crypto/subtle (constant-time comparisons) — https://pkg.go.dev/crypto/subtle
- os/exec (no shell by default; command execution guidance) — https://pkg.go.dev/os/exec
- unsafe (bypasses type safety) — https://go.dev/src/unsafe/unsafe.go
- net/http/pprof (debug endpoints) — https://pkg.go.dev/net/http/pprof
- cmd/go (module authentication via go.sum/checksum DB; env vars like GOINSECURE) — https://pkg.go.dev/cmd/go
- Module Mirror and Checksum Database Launched (Go blog) — https://go.dev/blog/module-mirror-launch
- govulncheck documentation — https://pkg.go.dev/golang.org/x/vuln/cmd/govulncheck
- Go Race Detector documentation — https://go.dev/doc/articles/race_detector
- bcrypt (password hashing) — https://pkg.go.dev/golang.org/x/crypto/bcrypt
- Go vulnerability entry example (multipart resource consumption) — https://pkg.go.dev/vuln/GO-2023-1569
OWASP Cheat Sheet Series (general web security):
- Session Management — https://cheatsheetseries.owasp.org/cheatsheets/Session_Management_Cheat_Sheet.html
- CSRF Prevention — https://cheatsheetseries.owasp.org/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.html
- SSRF Prevention — https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html
- XSS Prevention — https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html
- HTTP Security Response Headers — https://cheatsheetseries.owasp.org/cheatsheets/HTTP_Headers_Cheat_Sheet.html

View File

@@ -0,0 +1,747 @@
# Frontend JavaScript/TypeScript Web Security Spec (Vanilla Browser JS/TS, Modern Browsers)
This document is designed as a **security spec** that supports:
1. **Secure-by-default code generation** for new frontend JavaScript/TypeScript (no specific framework assumed).
2. **Security review / vulnerability hunting** in existing frontend code (passive “notice issues while working” and active “scan the repo and report findings”).
It is intentionally written as a set of **normative requirements** (“MUST/SHOULD/MAY”) plus **audit rules** (what bad patterns look like, how to detect them, and how to fix/mitigate them).
---
## 0) Safety, boundaries, and anti-abuse constraints (MUST FOLLOW)
* MUST NOT request, output, log, hard-code, or commit secrets (API keys intended to be secret, private keys, passwords, OAuth refresh tokens, session tokens, cookies).
Notes:
* Frontend code is inherently observable by end users. If a value must remain secret, it must not be in browser-delivered code.
* If the project uses “public” keys (e.g., publishable analytics keys), they MUST be treated as non-secret and scoped accordingly.
* MUST NOT “fix” security by disabling protections (e.g., weakening CSP with `unsafe-inline`/`unsafe-eval` without justification, removing origin checks for `postMessage`, switching to `innerHTML` for convenience, accepting arbitrary redirects/URLs, or turning off sanitization).
* MUST provide **evidence-based findings** during audits: cite file paths, code snippets, and relevant HTML/CSP/config values that justify the claim.
* MUST treat uncertainty honestly:
* Security headers (CSP, frame-ancestors, etc.) might be set by server/edge/CDN rather than in repo code. If not visible, report as “not visible here; verify at runtime/edge config.” (Also note that `<meta http-equiv=...>` only simulates a subset of headers; dont assume other security headers exist just because a meta tag exists.) ([MDN Web Docs][1])
---
## 1) Operating modes
### 1.1 Generation mode (default)
When asked to write new frontend JS/TS code or modify existing code:
* MUST follow every **MUST** requirement in this spec.
* SHOULD follow every **SHOULD** requirement unless the user explicitly says otherwise.
* MUST prefer safe-by-default browser APIs and proven libraries over custom security code (especially for HTML sanitization).
* MUST avoid introducing new risky sinks (DOM XSS injection sinks like `innerHTML`, navigation to `javascript:` URLs, dynamic code execution via `eval`/`Function`, unsafe `postMessage`, unsafe third-party script loading, etc.). ([OWASP Cheat Sheet Series][2])
### 1.2 Passive review mode (always on while editing)
While working anywhere in a frontend repo (even if the user did not ask for a security scan):
* MUST “notice” violations of this spec in touched/nearby code.
* SHOULD mention issues as they come up, with a brief explanation + safe fix.
### 1.3 Active audit mode (explicit scan request)
When the user asks to “scan”, “audit”, or “hunt for vulns”:
* MUST systematically search the codebase for violations of this spec.
* MUST output findings in a structured format (see §2.3).
Recommended audit order:
1. HTML entrypoints (`index.html`, server-rendered templates), script/style includes, and any CSP delivery (header vs meta). ([W3C][3])
2. DOM XSS sinks (`innerHTML`, `document.write`, `insertAdjacentHTML`, event-handler attributes) and their data sources (URL params/hash, storage, postMessage, API responses). ([OWASP Cheat Sheet Series][2])
3. Navigation/redirect handling (`window.location*`, link targets, URL allowlists) including `javascript:` URL hazards. ([MDN Web Docs][4])
4. Cross-origin communication (`postMessage`, iframe embed patterns, sandboxing). ([MDN Web Docs][5])
5. Storage of sensitive data (localStorage/sessionStorage) and assumptions about trust. ([OWASP Cheat Sheet Series][6])
6. Third-party scripts / tag managers / CDNs, and integrity controls (SRI) and policy controls (CSP). ([OWASP Cheat Sheet Series][7])
7. DOM clobbering gadgets and unsafe reliance on `window`/`document` named properties. ([OWASP Cheat Sheet Series][8])
---
## 2) Definitions and review guidance
### 2.1 Untrusted input (treat as attacker-controlled unless proven otherwise)
Examples include:
* URL-derived data: `location.href`, `location.search`, `location.hash`, `document.baseURI`, `new URLSearchParams(location.search)`, routing fragments. ([OWASP Cheat Sheet Series][2])
* DOM content that may include user-controlled markup (comments, profiles, CMS content, markdown-to-HTML output, etc.), especially if inserted dynamically. ([OWASP Cheat Sheet Series][2])
* `postMessage` event data (`event.data`) and metadata (`event.origin`) from other windows/frames. ([MDN Web Docs][5])
* Browser storage: `localStorage`, `sessionStorage`, IndexedDB (contents can be attacker-influenced via XSS or local machine access; never treat as “trusted”). ([OWASP Cheat Sheet Series][6])
* Any data returned from network calls (even if from “your API”), because it may contain stored attacker content that becomes dangerous only when inserted into the DOM. ([OWASP Cheat Sheet Series][2])
### 2.2 Dangerous sink (DOM XSS / code execution sink)
A sink is any API/operation that can execute script or interpret attacker-controlled strings as HTML/JS/URL in a security-sensitive way. High-signal sinks include:
* HTML parsing / insertion: `innerHTML`, `outerHTML`, `insertAdjacentHTML`, `document.write`, `document.writeln`. ([OWASP Cheat Sheet Series][2])
* Dynamic code execution: `eval`, `new Function`, `setTimeout("...")`, `setInterval("...")`. ([MDN Web Docs][10])
* Navigation to script-bearing URLs (e.g., `javascript:`) via setters like `Location.href`/`window.location` (and via link `href` if attacker-controlled). ([MDN Web Docs][4])
* Setting event handler attributes from strings, e.g. `setAttribute("onclick", "...")`. ([OWASP Cheat Sheet Series][2])
### 2.3 Required audit finding format
For each issue found, output:
* Rule ID:
* Severity: Critical / High / Medium / Low
* Location: file path + function/class/module + line(s)
* Evidence: the exact code/config snippet
* Impact: what could go wrong, who can exploit it
* Fix: safe change (prefer minimal diff)
* Mitigation: defense-in-depth if immediate fix is hard
* False positive notes: what to verify if uncertain
---
## 3) Secure baseline: minimum production configuration (MUST in production)
This is the smallest baseline that prevents common frontend JS/TS security misconfigurations. Some items are “in repo” (HTML/JS) and some may live at the server/edge.
### 3.1 Content Security Policy (CSP) baseline (SHOULD; MUST for high-risk apps)
* SHOULD deliver CSP via HTTP response headers when possible.
* MAY deliver CSP via an HTML `<meta http-equiv="Content-Security-Policy" ...>` tag when you cannot set headers (e.g., purely static hosting constraints). ([MDN Web Docs][1])
* If using CSP via `<meta http-equiv>`, MUST understand the limitations:
* The policy only applies to content that follows the meta element (so it must appear very early, before any scripts/resources you want governed). ([W3C][3])
* The following directives are **not supported** in a meta-delivered policy and will be ignored: `report-uri`, `frame-ancestors`, and `sandbox`. ([W3C][3])
* “Report-only” CSP cannot be set via a meta element. ([W3C][3])
Practical baseline goals:
* Avoid script sources `unsafe-inline` and `unsafe-eval` (they significantly weaken CSPs value against XSS). ([MDN Web Docs][10])
* Prefer nonce- or hash-based script policies if you need inline scripts. ([MDN Web Docs][10])
* Consider enabling Trusted Types enforcement where feasible. ([MDN Web Docs][11])
### 3.2 Third-party scripts baseline (SHOULD)
* SHOULD minimize third-party script execution and treat it as equivalent privilege to first-party JS (it runs with your origins privileges). ([OWASP Cheat Sheet Series][7])
* SHOULD use Subresource Integrity (SRI) for third-party scripts/styles loaded from CDNs. ([MDN Web Docs][12])
### 3.3 Cross-window communication baseline (SHOULD)
* SHOULD restrict `postMessage` communications to explicit origins, and validate both origin and message shape. ([MDN Web Docs][5])
---
## 4) Rules (generation + audit)
Each rule contains: required practice, insecure patterns, detection hints, and remediation.
### JS-XSS-001: Do not inject untrusted HTML into the DOM (avoid `innerHTML` and friends)
Severity: Critical if you can prove attacker-controlled input can reach these APIs; otherwise Medium
Required:
* MUST treat `innerHTML`, `outerHTML`, and `insertAdjacentHTML` as dangerous sinks when their input can contain untrusted data. ([OWASP Cheat Sheet Series][2])
* MUST prefer safe DOM APIs that do not parse HTML:
* `textContent` for text. ([OWASP Cheat Sheet Series][2])
* `document.createElement`, `appendChild`, `setAttribute` for non-event-handler attributes. ([OWASP Cheat Sheet Series][2])
* If HTML insertion is truly required, SHOULD sanitize with a well-reviewed HTML sanitizer and strongly consider enforcing Trusted Types to confine usage to audited code paths. ([MDN Web Docs][11])
Insecure patterns:
* `el.innerHTML = userInput`
* `el.insertAdjacentHTML('beforeend', userInput)`
* `el.outerHTML = userInput`
Detection hints:
* Search for: `.innerHTML`, `.outerHTML`, `insertAdjacentHTML(`.
* Trace the origin of inserted string: URL params/hash, postMessage, storage, API responses, DOM attributes. ([OWASP Cheat Sheet Series][2])
Fix:
* Replace with `textContent` for plain text. ([OWASP Cheat Sheet Series][2])
* For structured UI, build DOM nodes explicitly.
* For “rich text” requirements:
* Sanitize using an allowlist-based sanitizer.
* Prefer returning safe “components” instead of arbitrary HTML strings.
* Use Trusted Types enforcement to ensure only `TrustedHTML` reaches sinks where supported. ([MDN Web Docs][11])
Mitigation:
* Deploy a strict CSP and consider Trusted Types enforcement (`require-trusted-types-for 'script'`). ([MDN Web Docs][10])
False positive notes:
* If the string is provably constant or fully generated from trusted constants, it may be safe. Still prefer safer APIs.
---
### JS-XSS-002: Avoid `document.write` / `document.writeln` (XSS + document clobbering hazards)
Severity: Critical if you can prove attacker-controlled input can reach these APIs; otherwise Medium
Required:
* MUST avoid `document.write()` and `document.writeln()` in production code (they are XSS vectors and can be abused with crafted HTML even if some browsers block injected `<script>` in certain situations). ([MDN Web Docs][13])
* If legacy use is unavoidable, MUST ensure no untrusted input reaches these APIs and SHOULD enforce Trusted Types (`TrustedHTML`) where supported. ([MDN Web Docs][14])
Insecure patterns:
* `document.write(userInput)`
* `document.writeln(getParam('q'))`
Detection hints:
* Search for `document.write(`, `document.writeln(`. ([OWASP Cheat Sheet Series][2])
Fix:
* Replace with DOM manipulation (`createElement`, `appendChild`) or safe text insertion (`textContent`). ([OWASP Cheat Sheet Series][2])
Mitigation:
* Strict CSP + Trusted Types enforcement reduces blast radius if a sink remains. ([MDN Web Docs][10])
---
### JS-XSS-003: Do not use string-to-code execution (`eval`, `new Function`, string timeouts)
Severity: Critical if you can prove attacker-controlled input can reach these APIs; otherwise Medium
Required:
* MUST NOT pass untrusted data to:
* `eval()`
* `new Function(...)`
* `setTimeout("...")` / `setInterval("...")` with string arguments ([MDN Web Docs][10])
* SHOULD avoid these APIs entirely in modern frontend code; refactor to non-eval logic. ([MDN Web Docs][10])
* MUST NOT “fix CSP breakage” by adding `unsafe-eval` unless there is a documented, reviewed justification and compensating controls. ([MDN Web Docs][10])
Insecure patterns:
* `eval(userInput)`
* `new Function("return " + userInput)()`
* `setTimeout(userInput, 0)` where userInput is a string
Detection hints:
* Search for `eval(`, `new Function`, `setTimeout("`, `setInterval("`.
* Also search for construction of code strings used later.
Fix:
* Replace dynamic code with:
* structured data + explicit branching/handlers,
* JSON parsing (`JSON.parse`) instead of `eval` for JSON. ([OWASP Cheat Sheet Series][2])
Mitigation:
* CSP that blocks `eval()`-like APIs by default, and avoid `unsafe-eval`. ([MDN Web Docs][10])
* Consider Trusted Types for controlled cases, but treat it as a hardening layer, not a license to keep eval patterns. ([MDN Web Docs][10])
---
### JS-XSS-004: Do not set event handler attributes from strings (e.g., `setAttribute("onclick", "...")`)
Severity: High
Required:
* MUST NOT use `setAttribute("on…", string)` or similar patterns with untrusted data; this coerces strings into executable code in the event-handler context. ([OWASP Cheat Sheet Series][2])
* SHOULD prefer `addEventListener` with function references.
Insecure patterns:
* `el.setAttribute("onclick", userInput)`
* `el.onclick = userControlledString` (string assignment)
Detection hints:
* Search for `.setAttribute("on`, `.onclick =`, `.onmouseover =`, etc.
* Trace whether RHS can be influenced by URL/hash/storage/postMessage. ([OWASP Cheat Sheet Series][2])
Fix:
* Replace with `addEventListener("click", () => { ... })`.
* If dynamic dispatch is needed, use an allowlisted mapping from identifiers to functions (no string eval). ([OWASP Cheat Sheet Series][2])
---
### JS-URL-001: Sanitize and allowlist URLs before navigation (especially `window.location` / `location.replace`)
Severity: Low (High if you can prove an attacker can fully control the URL)
IMPORTANT: This can cause a lot of false positives. Please perform extra analysis to determine if the url is fully attacker controlled. If not fully attacker controlled, then this is informational at best.
NOTE: It may be important functionality to be able to redirect to any given url. If that is the goal of the feature, then at a minimum, ensure it checks the schema even if the origin is allowed to be anything.
Required:
* MUST treat any assignment to navigation targets as security-sensitive:
* `window.location = ...`
* `location.href = ...`
* `location.assign(...)`
* `location.replace(...)` ([MDN Web Docs][4])
* MUST prevent navigation to `javascript:` URLs (and generally other script-bearing/active schemes), especially when input is derived from URL params, storage, or messages. ([MDN Web Docs][4]). Only allow `http:` and `https:`.
* SHOULD validate/allowlist the destination. A safe baseline is:
* Allow only same-origin relative paths, OR
* Allow only a strict allowlist of origins and protocols (typically `https:` and optionally `http:` for localhost dev). ([OWASP Cheat Sheet Series][8])
Insecure patterns:
* `location.replace(getParam("next"))`
* `window.location = userSuppliedUrl`
* `location.assign(window.redirectTo || "/")` where `redirectTo` can be clobbered or attacker-set ([OWASP Cheat Sheet Series][8])
Detection hints:
* Search for `window.location`, `location.href`, `location.assign`, `location.replace`.
* Search for common redirect parameters: `next`, `returnTo`, `redirect`, `url`, `continue`.
* Search for `javascript:` literal usage. ([MDN Web Docs][4])
Fix:
* Parse and validate with `new URL(value, location.origin)` and then enforce:
* `url.protocol` in `{ "https:" }` (and only include `http:` in explicit dev-only code paths),
* `url.origin` equals `location.origin` for internal redirects, or in a strict allowlist for external redirects,
* optionally allow only specific path prefixes. ([MDN Web Docs][4])
* If validation fails, navigate to a safe default (home/dashboard).
Mitigation:
* Deploy strict CSP and Trusted Types enforcement to reduce the impact of DOM XSS sinks, but note that Trusted Types do not prevent every possible unsafe navigation scenario on their own. ([W3C][15])
False positive notes:
IMPORTANT: This can cause a lot of false positives. Please perform extra analysis to determine if the url is fully attacker controlled. If not fully attacker controlled, then this is informational at best.
* Some apps intentionally support external redirects (SSO, payment flows). Those MUST be allowlisted and documented.
---
### JS-URL-002: Sanitize URLs before inserting into DOM URL contexts (`href`, `src`, etc.)
Severity: Low (High if you can prove an attacker can fully control the URL)
IMPORTANT: This can cause a lot of false positives. Please perform extra analysis to determine if the url is fully attacker controlled. If not fully attacker controlled, then this is informational at best.
Required:
* MUST treat setting URL-bearing DOM attributes/properties as security-sensitive, especially:
* `a.href`, `img.src`, `script.src`, `iframe.src`, `form.action`, `link.href`.
* MUST prevent script-bearing schemes (`javascript:` and other active schemes) when values can be attacker-influenced. ([MDN Web Docs][4])
* SHOULD prefer setting properties (e.g., `a.href = url.toString()`) after parsing and validation, rather than string concatenation.
Insecure patterns:
* `link.href = getParam("u")`
* `el.setAttribute("href", userInput)` without validation
* constructing URLs via concatenation with untrusted pieces
Detection hints:
* Search for `.href =`, `.src =`, `.action =`, `setAttribute("href"`, `setAttribute("src"`.
* Search for `javascript:` / `data:` usage in URLs. ([MDN Web Docs][4])
IMPORTANT: This can cause a lot of false positives. Please perform extra analysis to determine if the url is fully attacker controlled. If not fully attacker controlled, then this is informational at best.
Fix:
* Use `new URL(...)` and validate:
* protocol allowlist
* avoid passing user-provided values into `<script src>` at all (treat as code execution). ([OWASP Cheat Sheet Series][8])
---
### JS-CSP-001: Use CSP; meta delivery is allowed
Severity: Medium to High (depends on threat model; High when handling untrusted content)
NOTE: It is most important to set the CSP's script-src. All other directives are not as important and can generally be excluded for the ease of development.
Required:
* SHOULD deploy a CSP as a major defense-in-depth against XSS. ([MDN Web Docs][10])
* MAY provide CSP via `<meta http-equiv="Content-Security-Policy" ...>` when headers are not available. ([MDN Web Docs][1])
* If CSP is delivered via meta, MUST:
* place it early (before scripts/resources you want governed), and
* not rely on unsupported directives in meta policies (`report-uri`, `frame-ancestors`, `sandbox`). ([W3C][3])
* MUST avoid adding `unsafe-inline` as a “quick fix” for CSP issues unless explicitly required and reviewed (it defeats much of CSPs purpose). ([MDN Web Docs][10])
* MUST avoid adding `unsafe-eval` unless explicitly required and reviewed (it allows eval-like APIs that are commonly abused). ([MDN Web Docs][10])
Insecure patterns:
* No CSP present anywhere (repo HTML or server/edge) for an app that renders untrusted content.
* CSP includes `script-src 'unsafe-inline'` and/or `script-src 'unsafe-eval'` without strong justification. ([MDN Web Docs][10])
* CSP delivered via meta but includes `frame-ancestors` (it will be ignored in meta). ([W3C][3])
Detection hints:
* Search HTML for `<meta http-equiv="Content-Security-Policy"`.
* Search server/edge configs for `Content-Security-Policy` header.
* If CSP is only in meta, check it appears before any `<script>` tags you want governed. ([W3C][3])
Fix:
* Prefer header-delivered CSP at the server/edge.
* If constrained to meta, keep a strong allowlist CSP and document the limitations; implement clickjacking protections (e.g., `frame-ancestors`) at the server/edge, not in meta. ([W3C][3])
---
### JS-CSP-002: Prefer strict CSP (nonces/hashes); avoid inline/eval patterns in code
Severity: Medium
NOTE: It is most important to set the CSP's script-src. All other directives are not as important and can generally be excluded for the ease of development.
Required:
* SHOULD design frontend code to work under a strict CSP:
* avoid inline scripts and inline event handlers,
* avoid eval-like APIs (see JS-XSS-003),
* allow scripts via nonce or hash when needed. ([MDN Web Docs][10])
Insecure patterns:
* Large amounts of inline script blocks and inline `onclick="..."` handlers.
* Libraries that require `unsafe-eval`.
Detection hints:
* Search for `<script>` blocks with inline code, `onclick="`, `onload="`, etc.
* Search for CSP directives containing `unsafe-inline` or `unsafe-eval`. ([MDN Web Docs][10])
Fix:
* Move inline scripts into external JS files (same-origin).
* Use nonces/hashes for any unavoidable inline blocks. ([MDN Web Docs][10])
---
### JS-TT-001: Use Trusted Types to reduce DOM XSS attack surface (where supported)
Severity: Low
Required:
* SHOULD consider enabling Trusted Types enforcement with CSP `require-trusted-types-for 'script'` to make many DOM XSS sinks reject raw strings. ([MDN Web Docs][11])
* If using Trusted Types, SHOULD also use the CSP `trusted-types` directive to restrict which policies can be created (reduces policy sprawl and improves auditability). ([MDN Web Docs][16])
* MUST keep Trusted Types policy code small, heavily reviewed, and used as the only path to produce trusted values for sinks. ([W3C][15])
Insecure patterns:
* “Trusted Types enabled” but policy simply returns input unchanged (no sanitization/validation).
* Many ad-hoc policies created across the codebase without restriction.
* Belief that Trusted Types alone prevents all unsafe navigations or all XSS classes. (It targets DOM injection sinks; it is not a universal sandbox.) ([W3C][15])
Detection hints:
* Search for CSP directives: `require-trusted-types-for` and `trusted-types`.
* Search code for `trustedTypes.createPolicy(` and inspect policy implementations. ([MDN Web Docs][11])
Fix:
* Add a small set of well-reviewed policies (e.g., `createHTML` that sanitizes).
* Restrict allowed policies via `trusted-types <policyName...>`.
* Migrate sinks to require `TrustedHTML` / `TrustedScriptURL` as appropriate. ([MDN Web Docs][11])
---
### JS-MSG-001: `postMessage` must use strict origin validation and explicit targetOrigin
Severity: Medium (High if dangerous behavior can be triggered via postMessage)
Required:
* When sending messages, MUST set an explicit `targetOrigin` (not `*`) to avoid sending data to an unexpected origin after redirects or window origin changes. ([MDN Web Docs][5])
* When receiving messages, MUST:
* Validate `event.origin` exactly against an allowlist of expected origins (no substring matching). ([OWASP Cheat Sheet Series][6])
* Consider validating `event.source` (expected window reference) when applicable. ([MDN Web Docs][5])
* Validate `event.data` structure (schema/shape) and treat it purely as data (never evaluate it as code and never insert into DOM with `innerHTML`). ([OWASP Cheat Sheet Series][6])
Insecure patterns:
* `otherWindow.postMessage(payload, "*")`
* `window.addEventListener("message", (e) => { doSomething(e.data) })` with no `origin` check
* `if (e.origin.includes("trusted.com"))` (substring checks)
* `el.innerHTML = e.data` ([OWASP Cheat Sheet Series][6])
Detection hints:
* Search for `postMessage(`, `addEventListener("message"`, `onmessage =`.
* Audit all handlers for explicit allowlist checks on `event.origin`. ([OWASP Cheat Sheet Series][6])
Fix:
* Define an allowlist:
* `const ALLOWED = new Set(["https://app.example.com", "https://accounts.example.com"]);`
NOTE: For ease of development, you can use the current page's origin `window.location.origin` as a safe default origin.
* On receive:
* `if (!ALLOWED.has(event.origin)) return;`
* Validate `event.data` with a strict schema and reject unknown/extra fields.
* On send:
* use the exact expected origin string as `targetOrigin`. ([OWASP Cheat Sheet Series][6])
Mitigation:
* Combine with a strict CSP and avoid DOM sinks in message paths. ([MDN Web Docs][10])
---
### JS-STORAGE-001: Web Storage is not a safe place for secrets (and is attacker-influencable)
Severity: Low
Required:
* MUST NOT store sensitive secrets or session identifiers in `localStorage` (or `sessionStorage`) if compromise would matter; a single XSS can exfiltrate everything in storage. ([OWASP Cheat Sheet Series][6])
* MUST treat values read from storage as untrusted input (attackers can load malicious values into storage via XSS). ([OWASP Cheat Sheet Series][6])
* SHOULD prefer server-set cookies with `HttpOnly` for session identifiers (JS cannot set `HttpOnly`, so avoid storing session IDs in JS-accessible storage). ([OWASP Cheat Sheet Series][6])
* SHOULD avoid hosting multiple unrelated apps on the same origin if they rely on storage separation (storage is origin-wide). ([OWASP Cheat Sheet Series][6])
Insecure patterns:
* `localStorage.setItem("access_token", token)`
* `localStorage.setItem("session", sessionId)`
* Assuming `localStorage` is “trusted because same-origin.”
Detection hints:
* Search for `localStorage.getItem`, `localStorage.setItem`, `sessionStorage.*`.
* Flag storage keys named `token`, `jwt`, `session`, `auth`, `refresh`. ([OWASP Cheat Sheet Series][6])
Fix:
* Use server-managed sessions or short-lived tokens delivered and rotated securely, with careful XSS defenses (CSP/Trusted Types) and minimal JS exposure.
* If storage must be used for non-sensitive state, keep it non-auth and validate/escape before use.
---
### JS-SUPPLY-001: Third-party JavaScript is a major supply-chain risk; minimize and control it
Severity: Low
Required:
* MUST treat third-party JS as equivalent to first-party JS in privilege (it can execute arbitrary code in your origin and access DOM data). ([OWASP Cheat Sheet Series][7])
* SHOULD minimize third-party scripts and prefer:
* self-hosting / script mirroring,
* strict CSP allowlists,
* SRI for any CDN-hosted scripts,
* ongoing monitoring for unexpected changes. ([OWASP Cheat Sheet Series][7])
Insecure patterns:
* Loading arbitrary remote scripts from many vendors without review.
* Using tag managers that can dynamically inject scripts with no integrity controls.
* Allowing scripts from broad wildcards in CSP (e.g., `script-src *`). ([MDN Web Docs][10])
Detection hints:
* Search HTML for `<script src="https://...">` and `tag manager` snippets.
* Search CSP `script-src` sources for wildcards or overly broad domains.
* Search for dynamic script injection: `document.createElement("script")`, `script.src = ...`, `appendChild(script)`. ([OWASP Cheat Sheet Series][8])
Fix:
* Remove unnecessary third-party tags.
* Self-host or mirror scripts where possible.
* Lock down CSP `script-src` to the smallest set of trusted sources.
* Add SRI for CDN scripts/styles. ([OWASP Cheat Sheet Series][7])
---
### JS-SRI-001: Use Subresource Integrity (SRI) for third-party scripts/styles
Severity: Low
Required:
* SHOULD use SRI to ensure browsers only load third-party resources if they match an expected cryptographic hash. ([MDN Web Docs][12])
* MUST update SRI hashes whenever the underlying resource changes (pin versions; avoid “latest” URLs).
Insecure patterns:
* `<script src="https://cdn.example.com/lib.js"></script>` with no `integrity`.
* Loading `latest` or unpinned third-party resources.
Detection hints:
* Search for `<script src="https://` and `<link rel="stylesheet" href="https://` without `integrity=`.
* Check whether `integrity` is present and uses strong hashes (sha256/384/512 are typical). ([MDN Web Docs][12])
Fix:
* Add `integrity="sha384-..."` (or appropriate) and ensure proper CORS mode where needed.
* Prefer self-hosting critical libraries.
---
### FS-DOMC-001: Prevent DOM clobbering (avoid relying on `window`/`document` named properties)
Severity: Medium to High (can become Critical if it enables script loading or `javascript:` navigation)
Required:
* MUST NOT rely on implicit global variables or `window.someName` / `document.someName` lookups that can be clobbered by injected HTML elements with matching `id`/`name`. ([OWASP Cheat Sheet Series][8])
* MUST avoid patterns like `let x = window.redirectTo || "/safe"; location.assign(x);` where `redirectTo` could be clobbered to an `<a>` element whose `href` is attacker-controlled (including `javascript:`). ([OWASP Cheat Sheet Series][8])
* SHOULD use explicit variable declarations, local scope, and explicit DOM queries (`getElementById`) rather than named property access. ([OWASP Cheat Sheet Series][8])
* If the app inserts user-controlled markup (even sanitized), SHOULD ensure sanitization strategies consider `id`/`name` collisions. ([OWASP Cheat Sheet Series][8])
Insecure patterns:
* `const cfg = window.config || {};` used for security-sensitive URLs.
* `const redirect = window.redirectTo || "/"; location.assign(redirect);` ([OWASP Cheat Sheet Series][8])
* Loading scripts from `window.*` config values without strict validation.
Detection hints:
* Search for `window.` and `document.` used as config stores (especially `||` fallback patterns).
* Search for usage of `location.assign/replace` with variables that come from `window`/`document` properties.
* Search for dynamic script creation (`createElement('script')`) where `.src` comes from a non-local variable. ([OWASP Cheat Sheet Series][8])
Fix:
* Store config in module-scoped constants (not on `window`/`document`) and pass it explicitly.
* Validate any URL-like config with protocol/origin allowlists (see FEJS-URL-001). ([OWASP Cheat Sheet Series][8])
* Consider hardening: sanitization, CSP, and (in limited cases) freezing sensitive objects, but treat these as defense-in-depth, not a substitute for safe coding patterns. ([OWASP Cheat Sheet Series][8])
---
## 5) Practical scanning heuristics (how to “hunt”)
When actively scanning, use these high-signal patterns:
* DOM XSS sinks:
* `.innerHTML`, `.outerHTML`, `insertAdjacentHTML(`
* `document.write(`, `document.writeln(` ([OWASP Cheat Sheet Series][2])
* Dangerous navigation / URL sinks:
* `window.location`, `location.href`, `location.assign`, `location.replace`
* `javascript:` literals (and other suspicious schemes like `data:text/html`) ([MDN Web Docs][4])
* String-to-code execution:
* `eval(`, `new Function`, `setTimeout("`, `setInterval("` ([MDN Web Docs][10])
* Event-handler string injection:
* `.setAttribute("on`, `.onclick =`, `.onload =` with strings ([OWASP Cheat Sheet Series][2])
* `postMessage`:
* `postMessage(` with `"*"` as targetOrigin
* `addEventListener("message"` without strict `event.origin` allowlist checks ([MDN Web Docs][5])
* Storage:
* `localStorage.setItem(` / `getItem(`, `sessionStorage.*`
* keys containing `token`, `jwt`, `session`, `auth`, `refresh` ([OWASP Cheat Sheet Series][6])
* CSP and related:
* `Content-Security-Policy` header config (server/edge)
* `<meta http-equiv="Content-Security-Policy" ...>`
* CSP containing `unsafe-inline` or `unsafe-eval`
* `require-trusted-types-for` / `trusted-types` directives ([MDN Web Docs][1])
* Third-party scripts:
* `<script src="https://...">` without `integrity=`
* Tag manager snippets and dynamic script injection code paths ([MDN Web Docs][12])
* DOM clobbering gadgets:
* `window.<name> || ...` and `document.<name> || ...` patterns
* security-sensitive usage of `window`/`document` properties as config sources ([OWASP Cheat Sheet Series][8])
Always try to confirm:
* data origin (untrusted vs trusted),
* sink type (HTML parse, navigation, code execution, message handling, storage),
* protective controls present (CSP, Trusted Types, sanitizers, strict allowlists, schema validation).
---
## 6) Sources (accessed 2026-01-27)
Primary standards / platform docs:
* W3C Content Security Policy Level 2 (HTML `<meta>` delivery restrictions; unsupported directives in meta CSP): `https://www.w3.org/TR/CSP2/` ([W3C][3])
* MDN: CSP Guide (strict CSP, nonces/hashes, `unsafe-inline`/`unsafe-eval`, eval blocking): `https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/CSP` ([MDN Web Docs][10])
* MDN: `<meta http-equiv>` (CSP via meta and warning about meta-based security headers): `https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/meta/http-equiv` ([MDN Web Docs][1])
* MDN: `frame-ancestors` (and note its not supported in `<meta>`): `https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Security-Policy/frame-ancestors` ([MDN Web Docs][18])
DOM XSS and dangerous sinks:
* OWASP: DOM Based XSS Prevention Cheat Sheet (dangerous sinks + safe patterns like `textContent`): `https://cheatsheetseries.owasp.org/cheatsheets/DOM_based_XSS_Prevention_Cheat_Sheet.html` ([OWASP Cheat Sheet Series][2])
* MDN: `innerHTML` (security considerations): `https://developer.mozilla.org/en-US/docs/Web/API/Element/innerHTML` ([MDN Web Docs][19])
* MDN: `insertAdjacentHTML` (security considerations): `https://developer.mozilla.org/en-US/docs/Web/API/Element/insertAdjacentHTML` ([MDN Web Docs][20])
* MDN: `document.write()` / `document.writeln()` (security considerations): `https://developer.mozilla.org/en-US/docs/Web/API/Document/write` and `https://developer.mozilla.org/en-US/docs/Web/API/Document/writeln` ([MDN Web Docs][13])
URL scheme hazards:
* MDN: `javascript:` URLs (execution on navigation; discouraged; references `window.location`): `https://developer.mozilla.org/en-US/docs/Web/URI/Reference/Schemes/javascript` ([MDN Web Docs][4])
Trusted Types:
* W3C: Trusted Types spec (DOM XSS sinks include `Element.innerHTML` and `Location.href` setters; goals and limitations): `https://www.w3.org/TR/trusted-types/` ([W3C][15])
* MDN: `require-trusted-types-for` directive: `https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Security-Policy/require-trusted-types-for` ([MDN Web Docs][11])
* MDN: `trusted-types` directive: `https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Security-Policy/trusted-types` ([MDN Web Docs][16])
Cross-window messaging:
* MDN: `window.postMessage` (security guidance: specify targetOrigin; validate origin): `https://developer.mozilla.org/en-US/docs/Web/API/Window/postMessage` ([MDN Web Docs][5])
* OWASP: HTML5 Security Cheat Sheet (Web Messaging guidance: explicit origin, strict checks, no `innerHTML`): `https://cheatsheetseries.owasp.org/cheatsheets/HTML5_Security_Cheat_Sheet.html` ([OWASP Cheat Sheet Series][6])
Third-party scripts and integrity:
* OWASP: Third Party JavaScript Management Cheat Sheet (risks and mitigations including SRI/mirroring): `https://cheatsheetseries.owasp.org/cheatsheets/Third_Party_Javascript_Management_Cheat_Sheet.html` ([OWASP Cheat Sheet Series][7])
* MDN: Subresource Integrity overview: `https://developer.mozilla.org/en-US/docs/Web/Security/Defenses/Subresource_Integrity` ([MDN Web Docs][12])
* W3C: Subresource Integrity spec: `https://www.w3.org/TR/sri-2/` ([W3C][21])
DOM clobbering:
* OWASP: DOM Clobbering Prevention Cheat Sheet (named property access risk; example attacks involving `location.assign` and `javascript:`): `https://cheatsheetseries.owasp.org/cheatsheets/DOM_Clobbering_Prevention_Cheat_Sheet.html` ([OWASP Cheat Sheet Series][8])
[1]: https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/meta/http-equiv "https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/meta/http-equiv"
[2]: https://cheatsheetseries.owasp.org/cheatsheets/DOM_based_XSS_Prevention_Cheat_Sheet.html "https://cheatsheetseries.owasp.org/cheatsheets/DOM_based_XSS_Prevention_Cheat_Sheet.html"
[3]: https://www.w3.org/TR/CSP2/ "Content Security Policy Level 2"
[4]: https://developer.mozilla.org/en-US/docs/Web/URI/Reference/Schemes/javascript "javascript: URLs - URIs | MDN"
[5]: https://developer.mozilla.org/en-US/docs/Web/API/Window/postMessage "https://developer.mozilla.org/en-US/docs/Web/API/Window/postMessage"
[6]: https://cheatsheetseries.owasp.org/cheatsheets/HTML5_Security_Cheat_Sheet.html "https://cheatsheetseries.owasp.org/cheatsheets/HTML5_Security_Cheat_Sheet.html"
[7]: https://cheatsheetseries.owasp.org/cheatsheets/Third_Party_Javascript_Management_Cheat_Sheet.html "https://cheatsheetseries.owasp.org/cheatsheets/Third_Party_Javascript_Management_Cheat_Sheet.html"
[8]: https://cheatsheetseries.owasp.org/cheatsheets/DOM_Clobbering_Prevention_Cheat_Sheet.html "https://cheatsheetseries.owasp.org/cheatsheets/DOM_Clobbering_Prevention_Cheat_Sheet.html"
[9]: https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Attributes/rel/noopener "https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Attributes/rel/noopener"
[10]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/CSP "https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/CSP"
[11]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Security-Policy/require-trusted-types-for "https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Security-Policy/require-trusted-types-for"
[12]: https://developer.mozilla.org/en-US/docs/Web/Security/Defenses/Subresource_Integrity "https://developer.mozilla.org/en-US/docs/Web/Security/Defenses/Subresource_Integrity"
[13]: https://developer.mozilla.org/en-US/docs/Web/API/Document/write "https://developer.mozilla.org/en-US/docs/Web/API/Document/write"
[14]: https://developer.mozilla.org/en-US/docs/Web/API/Document/writeln "https://developer.mozilla.org/en-US/docs/Web/API/Document/writeln"
[15]: https://www.w3.org/TR/trusted-types/ "https://www.w3.org/TR/trusted-types/"
[16]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Security-Policy/trusted-types "https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Security-Policy/trusted-types"
[18]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Security-Policy/frame-ancestors "https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Security-Policy/frame-ancestors"
[19]: https://developer.mozilla.org/en-US/docs/Web/API/Element/innerHTML "https://developer.mozilla.org/en-US/docs/Web/API/Element/innerHTML"
[20]: https://developer.mozilla.org/en-US/docs/Web/API/Element/insertAdjacentHTML "https://developer.mozilla.org/en-US/docs/Web/API/Element/insertAdjacentHTML"
[21]: https://www.w3.org/TR/sri-2/ "https://www.w3.org/TR/sri-2/"

View File

@@ -0,0 +1,678 @@
# jQuery Frontend Security Spec (jQuery 4.0.x, modern browsers)
This document is designed as a **security spec** that supports:
1. **Secure-by-default code generation** for new jQuery-based frontend code.
2. **Security review / vulnerability hunting** in existing jQuery-based code (passive “notice issues while working” and active “scan the repo and report findings”).
It is intentionally written as a set of **normative requirements** (“MUST/SHOULD/MAY”) plus **audit rules** (what bad patterns look like, how to detect them, and how to fix/mitigate them).
---
## 0) Safety, boundaries, and anti-abuse constraints (MUST FOLLOW)
* MUST NOT request, output, log, or commit secrets (API keys, passwords, private keys, session tokens, refresh tokens, CSRF tokens, session cookies).
* MUST treat the browser as an attacker-controlled environment:
* Frontend checks (UI gating, “disable button”, hidden fields, client-side validation) MUST NOT be treated as authorization or a security boundary.
* Server-side authorization and validation MUST exist even if frontend is “correct”.
* MUST NOT “fix” security by disabling protections (e.g., relaxing CSP to allow `unsafe-inline`, enabling JSONP “because it works”, adding broad CORS, disabling sanitization, suppressing security checks).
* MUST provide evidence-based findings during audits: cite file paths, code snippets, and relevant configuration values.
* MUST treat uncertainty honestly: if a protection might exist at the edge (CDN/WAF/reverse proxy headers like CSP), report it as “not visible in repo; verify at runtime/config”.
---
## 1) Operating modes
### 1.1 Generation mode (default)
When asked to write new jQuery code or modify existing jQuery code:
* MUST follow every **MUST** requirement in this spec.
* SHOULD follow every **SHOULD** requirement unless the user explicitly says otherwise.
* MUST prefer safe-by-default patterns: text insertion, DOM node construction, allowlists, and proven sanitization libraries over custom escaping.
* MUST avoid introducing new risky sinks (HTML string building, dynamic script loading, JSONP, inline script/event-handler attributes, unsafe URL assignment, unsafe object merging).
### 1.2 Passive review mode (always on while editing)
While working anywhere in a repo that uses jQuery (even if the user did not ask for a security scan):
* MUST “notice” violations of this spec in touched/nearby code.
* SHOULD mention issues as they come up, with a brief explanation + safe fix.
### 1.3 Active audit mode (explicit scan request)
When the user asks to “scan”, “audit”, or “hunt for vulns”:
* MUST systematically search the codebase for violations of this spec.
* MUST output findings in the structured format (see §2.3).
Recommended audit order:
1. jQuery sourcing, versions, and dependency hygiene (script tags, lockfiles, CDN usage, SRI).
2. CSP / Trusted Types / security headers posture (in repo and at runtime if observable).
3. DOM XSS: untrusted sources → jQuery sinks (`.html`, `.append`, `$("<…>")`, `.load`, etc.).
4. Script execution sinks: JSONP, `dataType:"script"`, `$.getScript`, dynamic `<script>` insertion.
5. URL/attribute assignment (`href`, `src`, `style`, `on*` attributes).
6. Prototype pollution / unsafe object merging (`$.extend` patterns).
7. AJAX auth patterns + CSRF for cookie-based sessions.
8. Third-party plugins and untrusted content rendering paths (comments, WYSIWYG, markdown-to-HTML).
---
## 2) Definitions and review guidance
### 2.1 Untrusted input (treat as attacker-controlled unless proven otherwise)
Examples include:
* Any data from the server that originates from users (user profiles, comments, “display name”, rich text, filenames).
* Data from third-party APIs or services.
* Browser-controlled sources:
* `location.href`, `location.search`, `location.hash`
* `document.URL`, `document.baseURI`, `document.referrer`
* `window.name`
* `localStorage` / `sessionStorage`
* `postMessage` event data (unless strict origin and schema validation exists)
* Any DOM content that could have been injected previously (stored XSS)
### 2.2 High-risk “sinks” in jQuery contexts
A sink is a code path where untrusted input can become interpreted as executable code or HTML.
Key jQuery sink categories:
* HTML insertion / parsing:
* DOM manipulation methods that accept HTML strings such as `.html()`, `.append()`, and related methods (see CVE notes below). ([NVD][1])
* `$(htmlString)` (when the argument can be interpreted as HTML markup).
* `jQuery.parseHTML(html, …, keepScripts)` especially with `keepScripts=true`. ([jQuery API][2])
* `.load(url)` (loads HTML into DOM; has special script execution behavior). ([jQuery API][3])
* Script execution / dynamic code loading:
* `$.getScript()` / `$.ajax({ dataType: "script" })` (executes fetched JavaScript). ([jQuery API][4])
* JSONP (`dataType: "jsonp"` or implicit JSONP behavior) (executes remote JavaScript as a response). ([jQuery API][5])
* `eval`, `new Function`, `setTimeout("…")`, `setInterval("…")`, `$.globalEval` (if present)
* Dangerous attribute assignment:
* Assigning untrusted strings to `href`, `src`, `srcdoc`, `style`, or event-handler attributes (`onload`, `onclick`, etc.)
* `javascript:` URLs are particularly dangerous and discouraged. ([MDN Web Docs][6])
### 2.3 Required audit finding format
For each issue found, output:
* Rule ID:
* Severity: Critical / High / Medium / Low
* Location: file path + function/component + line(s)
* Evidence: the exact code/config snippet
* Impact: what could go wrong, who can exploit it
* Fix: safe change (prefer minimal diff)
* Mitigation: defense-in-depth if immediate fix is hard
* False positive notes: what to verify if uncertain
---
## 3) Secure baseline: minimum production configuration (MUST in production)
This is the smallest “production baseline” that prevents common jQuery-related security failures.
### 3.1 Use a supported, patched jQuery version (MUST)
* MUST use a supported jQuery major version and keep it updated.
* As of 2026-01-27, the jQuery project ships jQuery 4.0.0 as the latest major release. ([blog.jquery.com][7])
* If you must support very old browsers (notably IE < 11), jQuery 4 does not support them and you may need to stay on jQuery 3.x; treat this as a higher risk posture and patch aggressively. ([blog.jquery.com][7])
### 3.2 Load jQuery safely (MUST)
* MUST load jQuery only from:
* Your own build pipeline (bundled via npm/yarn + lockfile), or
* The official jQuery CDN / a trusted CDN with Subresource Integrity (SRI) enabled.
* If loading from a CDN, SHOULD use SRI (`integrity`) and correct `crossorigin` settings; the jQuery project explicitly supports and recommends SRI on its CDN. (Retrieved from [jquery.com][8])
### 3.3 CSP + Trusted Types (SHOULD, and MUST where available/required by policy)
* SHOULD deploy a Content Security Policy (CSP) that reduces XSS impact (especially `script-src` restrictions and avoiding `unsafe-inline`). If not done through HTTP server, this can be done through the `<meta http-equiv="Content-Security-Policy" content="...">` tag. ([OWASP Cheat Sheet Series][9]) NOTE: It is most important to set the CSP's script-src. All other directives are not as important and can generally be excluded for the ease of development.
* SHOULD consider Trusted Types as a strong defense-in-depth against DOM XSS. ([W3C][10])
* If you deploy the CSP directive `require-trusted-types-for`, then code MUST route DOM-injection through Trusted Types policies. ([MDN Web Docs][11])
* Note: jQuery 4.0 explicitly added Trusted Types support so that TrustedHTML can be used with jQuery manipulation methods without violating `require-trusted-types-for`. ([blog.jquery.com][7])
### 3.4 Security headers and cookie posture (defense in depth; SHOULD)
Even though these are typically set server-side, they materially reduce the blast radius of jQuery-related mistakes. However if the context is only the frontend web application, these cannot be acted on.
* SHOULD set common security headers (CSP, `X-Content-Type-Options: nosniff`, clickjacking protection via `frame-ancestors` / `X-Frame-Options`, `Referrer-Policy`). ([OWASP Cheat Sheet Series][12])
* SHOULD avoid storing long-lived secrets/tokens in places accessible to JavaScript (like `localStorage`) unless the threat model explicitly accepts “XSS == account takeover”. This is not jQuery-specific, but jQuery-heavy DOM manipulation increases the chance of DOM XSS regressions; reduce the payoff.
---
## 4) Rules (generation + audit)
Each rule contains: required practice, insecure patterns, detection hints, and remediation.
### JQ-SUPPLY-001: jQuery MUST be patched; do not run known vulnerable versions
Severity: Medium (High if internet-facing app AND version is known-vulnerable)
NOTE: Before performing an upgrade, get concent from the user and try to understand if they have reasons to keep it back. Upgrading can break applications in unexpected ways. Report and recommend upgrades rather than just performing them.
Required:
* MUST NOT use jQuery versions with known high-impact vulnerabilities when a patched version exists.
* MUST upgrade past:
* CVE-2019-11358 (prototype pollution in jQuery before 3.4.0). ([NVD][13])
* CVE-2020-11022 / CVE-2020-11023 (XSS risks in DOM manipulation methods when handling untrusted HTML; patched in 3.5.0). ([NVD][1])
Insecure patterns:
* Script tags or package manifests referencing old jQuery (e.g., `jquery-1.*`, `jquery-2.*`, `jquery-3.3.*`, `jquery-3.4.*`, `jquery-3.4.1`, etc.).
* Bundled vendor directories containing old minified jQuery without an upgrade path.
Detection hints:
* Search HTML/templates for `jquery-` and parse version strings.
* Check `package.json`, `package-lock.json`, `yarn.lock`, `pnpm-lock.yaml`.
* Check `vendor/`, `public/`, `static/`, `assets/`, `wwwroot/` for `jquery*.js`.
Fix:
* Upgrade to current jQuery (prefer latest stable major; as of 2026-01-27, 4.0.0 is current). ([blog.jquery.com][7])
* If upgrade is constrained, at minimum upgrade beyond the CVE thresholds and add compensating controls (strong CSP, strict sanitization, remove risky APIs like JSONP, remove deep-extend of untrusted objects).
Notes:
* If a product requirement forces old versions, report as “accepted risk requiring compensating controls”.
---
### JQ-SUPPLY-002: Third-party script loading SHOULD use integrity and trusted origins
Severity: High
Required:
* MUST load jQuery and plugins only from trusted origins.
* If loaded from CDN, SHOULD use SRI (`integrity`) and correct `crossorigin` handling. ([jquery.com][8])
Insecure patterns:
* `<script src="https://…/jquery.min.js"></script>` with no `integrity`.
* Loading jQuery from random third-party CDNs without an explicit trust decision.
Detection hints:
* Scan HTML for `<script src=` and check for `integrity=` + `crossorigin=`.
* Identify dynamic script insertion with untrusted URLs (see JQ-EXEC-001).
Fix:
* Prefer bundling via npm + lockfile.
* If using CDN, copy official script tag (jQuery CDN supports SRI). ([jquery.com][8])
Note: If unable to get the correct SRI tag, skip this step but tell the user. If you end up using the wrong one the app will not function. In that case remove it and inform the user.
---
### JQ-XSS-001: Untrusted data MUST NOT be inserted as HTML via jQuery DOM-manipulation methods
Severity: High (if attacker-controlled content reaches these sinks)
Required:
* MUST treat any HTML string insertion as a code execution boundary.
* MUST use safe alternatives for untrusted text:
* `.text(untrusted)` (text, not HTML). ([jQuery API][14])
* `.val(untrusted)` for form fields. ([jQuery API][15])
* Create elements and set text/attributes safely instead of concatenating HTML strings.
Insecure patterns (examples):
* `$(selector).html(untrusted)`
* `$(selector).append(untrusted)`
* `$(selector).before(untrusted)` / `.after(untrusted)` / `.replaceWith(untrusted)` / `.wrap(untrusted)` (and similar)
* Building markup: `"<div>" + untrusted + "</div>"` then passing to jQuery
Detection hints:
* Grep for: `.html(`, `.append(`, `.prepend(`, `.before(`, `.after(`, `.replaceWith(`, `.wrap(`, `.wrapAll(`, `.wrapInner(`
* Trace dataflow into these calls from sources in §2.1.
Fix:
* Replace with `.text()` / `.val()` or node construction:
* `const $el = $("<span>").text(untrusted); container.append($el);`
* If the output must contain limited markup, see JQ-XSS-002 (sanitization).
Notes:
* Older jQuery versions had additional edge cases even when attempting sanitization; patched in 3.5.0+. Still: never rely on “string sanitization” alone—prefer structured creation or proven sanitizers. ([GitHub][16])
---
### JQ-XSS-002: If rendering user-controlled HTML is required, it MUST be sanitized with a proven HTML sanitizer
Severity: Medium (High if rich HTML is attacker-controlled and sanitizer is weak/misconfigured)
Required:
* MUST NOT “roll your own” HTML sanitizer with regexes.
* If user-controlled HTML must be displayed (e.g., rich text comments), MUST sanitize using a well-maintained HTML sanitizer and a restrictive allowlist.
* DOMPurify is a common choice; use conservative configuration and keep it updated. ([GitHub][17])
* Where available, MAY consider the browser HTML Sanitizer API (note: limited browser availability). ([MDN Web Docs][18])
* SHOULD pair sanitization with CSP and, where feasible, Trusted Types for defense in depth. ([OWASP Cheat Sheet Series][9])
Insecure patterns:
* Regex-based “strip `<script>`” or “escape `<`” attempts followed by `.html()` insertion.
* DOMPurify (or similar) configured to allow overly broad tags/attributes, or configuration thats not reviewed.
Detection hints:
* Search for “sanitize” helper functions, regex replacing `<`/`>` patterns, or “allow all tags” configs.
* Identify features that render user-generated “rich text” or “custom HTML”.
* Check if sanitizer results are inserted with `.html()` or equivalent sinks.
Fix:
* Introduce a sanitizer with strict allowlist.
* Centralize the “sanitize then inject” pattern into a single reviewed module.
* Add regression tests covering representative malicious inputs (dont store payloads in logs or telemetry).
False positive notes:
* If content is guaranteed trusted (e.g., compiled templates shipped by you), document the trust boundary and why it is not attacker-controlled.
---
### JQ-XSS-003: `$(untrustedString)` and `jQuery.parseHTML` MUST NOT process attacker-controlled markup
Severity: High (if attacker-controlled)
Required:
* MUST NOT pass attacker-controlled strings to `$()` when they might be interpreted as HTML.
* MUST treat `jQuery.parseHTML(html, …, keepScripts)` as a high-risk primitive; keepScripts MUST be `false` for any untrusted input. ([jQuery API][2])
Insecure patterns:
* `const $node = $(untrusted);`
* `$.parseHTML(untrusted, /* context */, true)` (scripts preserved)
Detection hints:
* Search for `$(` calls where the argument is not a static selector or static markup.
* Search for `$.parseHTML(` and inspect the `keepScripts` argument.
Fix:
* Use DOM creation with constant tag names and `.text()` for untrusted values.
* If parsing HTML is necessary, sanitize first (JQ-XSS-002) and keep scripts disabled.
---
### JQ-XSS-004: `.load()` MUST be treated as an HTML+script injection surface
Severity: Medium (High if URL/content is attacker-controlled)
Required:
* MUST NOT use `.load()` with attacker-controlled URLs or attacker-controlled HTML fragments.
* MUST understand jQuery `.load()` script behavior:
* Without a selector in the URL, content is passed to `.html()` before scripts are removed, which can execute scripts. ([jQuery API][3])
* SHOULD prefer `fetch()`/XHR to retrieve data, then render with safe DOM creation or sanitize explicitly.
Insecure patterns:
* `$("#target").load(untrustedUrl)`
* `$("#target").load("/path?param=" + untrusted)`
Detection hints:
* Search for `.load(` across JS/TS files.
* Identify whether a selector is appended to the URL (the behavior differs). ([jQuery API][3])
* Trace whether the URL can be influenced by user input.
Fix:
* Replace `.load()` with:
* `fetch()` to retrieve JSON, then render via `.text()` / node construction, or
* `fetch()` to retrieve HTML, sanitize it, then inject.
* If `.load()` must remain, ensure the URL is constant or strictly allowlisted and the returned content is trusted.
---
### JQ-EXEC-001: Dynamic script execution and script fetching MUST NOT be reachable from untrusted input
Severity: High
Required:
* MUST NOT fetch-and-execute scripts from untrusted or user-influenced URLs.
* MUST treat these as code execution primitives:
* `$.getScript(url)` executes the fetched script in the global context. ([jQuery API][4])
* `$.ajax({ dataType: "script" })` and other script-typed requests that execute responses.
* SHOULD remove these patterns unless there is a strong, reviewed justification.
Insecure patterns:
* `$.getScript(untrustedUrl)`
* `$.ajax({ url: untrustedUrl, dataType: "script" })`
* Dynamic `<script src=...>` injection where `src` is derived from untrusted input.
Detection hints:
* Search for `getScript(`, `dataType: "script"`, `globalEval`, `eval`, `new Function`.
* Look for “plugin loader” or “theme loader” features that accept URLs.
Fix:
* Bundle scripts at build time.
* If runtime-loading is required, restrict to allowlisted, versioned, integrity-checked assets (and ideally still avoid runtime code loading).
---
### JQ-AJAX-001: JSONP MUST be disabled unless the endpoint is fully trusted (and even then, avoid)
Severity: Medium (High if attacker can influence URL/endpoint)
Required:
* MUST NOT use JSONP for untrusted endpoints because it executes JavaScript responses.
* When using `$.ajax`, MUST explicitly disable JSONP for non-fully-trusted targets; jQuerys own docs recommend setting `jsonp: false` “for security reasons” if you dont trust the target. ([jQuery API][5])
* SHOULD prefer CORS with JSON (`dataType: "json"`) and explicit origin allowlists server-side.
Insecure patterns:
* `dataType: "jsonp"`
* URLs containing `callback=?` or patterns that trigger JSONP behavior. callback arguments are historically XSS vectors.
* `$.get(untrustedUrl)` without pinning `dataType` and disabling JSONP (risk depends on options and jQuery behavior)
Detection hints:
* Search for `jsonp`, `dataType: "jsonp"`, `callback=?`.
* Search for cross-domain AJAX where the URL is not hard-coded or allowlisted.
Fix:
* Use JSON over HTTPS with CORS configured server-side.
* Set:
* `dataType: "json"`
* `jsonp: false` (defense in depth when URL might be ambiguous) ([jQuery API][5])
---
### JQ-AJAX-002: State-changing AJAX requests using cookie auth MUST be CSRF-protected
Severity: High
NOTE: This only matters when using cookie based auth. If the request use Authorization header, there is no CSRF potential.
Required:
* If authentication uses cookies, MUST protect state-changing requests (POST/PUT/PATCH/DELETE) against CSRF.
* SHOULD use server-verified CSRF tokens; for AJAX calls, tokens are commonly sent in a custom header. ([OWASP Cheat Sheet Series][19])
* MUST NOT treat “its an AJAX request” as CSRF protection by itself.
Insecure patterns:
* `$.post("/transfer", {...})` or `$.ajax({ method: "POST", ... })` with cookie auth and no CSRF token/header.
* “CSRF protection” that only checks for `X-Requested-With` (defense-in-depth only, not primary).
Detection hints:
* Enumerate state-changing AJAX calls and locate whether they include CSRF tokens.
* Identify how the server expects CSRF validation (meta tag, cookie-to-header double submit, synchronizer token, etc.).
Fix:
* Add CSRF token inclusion in a centralized place, e.g., `$.ajaxSetup({ headers: { "X-CSRF-Token": token } })`, and ensure server verifies.
* Follow OWASP CSRF guidance for token properties and validation. ([OWASP Cheat Sheet Series][19])
False positive notes:
* If auth is not cookie-based (e.g., Authorization header bearer token) CSRF risk is different; verify actual auth mechanism.
---
### JQ-ATTR-001: Untrusted values MUST NOT be written into dangerous attributes without validation/allowlisting
Severity: Low (High for events like onclick)
Required:
* MUST validate/allowlist URLs written into `href`, `src`, `action`, etc.
* MUST block dangerous schemes; `javascript:` URLs are discouraged because they can execute code. ([MDN Web Docs][6])
* MUST NOT set event-handler attributes (`onclick`, `onerror`, etc.) from strings.
* SHOULD avoid writing untrusted strings into `style` attributes; prefer toggling predefined CSS classes.
Insecure patterns:
* `$("a").attr("href", untrustedUrl)`
* `$("img").attr("src", untrustedUrl)`
* `$(el).attr("style", untrustedCss)`
* `$(el).attr("onclick", untrustedJs)`
Detection hints:
* Search for `.attr("href"`, `.attr("src"`, `.attr("style"`, `.prop("href"`, `.prop("src"`.
* Trace whether inputs come from URL params, server JSON, DOM, or storage.
Fix:
* Parse and validate URLs with `new URL(value, location.origin)` and allowlist protocols (`https:` etc.) and hostnames when needed.
* For navigation targets, prefer relative paths you construct rather than full URLs.
* Replace `style` strings with `addClass/removeClass` using predefined class names.
---
### JQ-SELECTOR-001: User-controlled selector fragments MUST be escaped with `jQuery.escapeSelector`
Severity: Medium (can become High if it enables wrong-element selection in security-relevant UI)
Required:
* If you must select by an ID/class that can contain special CSS characters, SHOULD use `jQuery.escapeSelector()` (available in jQuery 3.0+). ([jQuery API][20])
* MUST NOT concatenate raw attacker-controlled strings into selector expressions.
Insecure patterns:
* `$("#" + untrustedId)`
* `$("[data-id='" + untrusted + "']")` (especially without strict quoting/escaping)
Detection hints:
* Search for `"#" +`, `". " +`, or template strings used inside `$(` selectors.
* Look for “select by user-supplied id”.
Fix:
* `$("#" + $.escapeSelector(untrustedId))` ([jQuery API][20])
* Prefer stable internal IDs over user-derived selectors.
Notes:
* This is often “robustness”, but it can become security-relevant if incorrect selection causes UI to reveal/modify the wrong data or skip security-related prompts.
---
### JQ-PROTOTYPE-001: Do not deep-merge untrusted objects; prevent prototype pollution
Severity: Medium
Required:
* MUST NOT deep-merge (`$.extend(true, …)`) attacker-controlled objects into application objects without filtering dangerous keys.
* MUST ensure jQuery is >= 3.4.0 to avoid CVE-2019-11358 prototype pollution behavior. ([NVD][13])
Insecure patterns:
* `$.extend(true, target, untrustedObj)`
* `$.extend(true, {}, defaults, untrustedObj)` where untrustedObj comes from URL/JSON/storage
Detection hints:
* Search for `$.extend(true` and inspect sources of merged objects.
* Search for “merge options” / “apply config” patterns using untrusted JSON.
Fix:
* Prefer:
* Shallow merges with an allowlisted set of keys, or
* A safe merge helper that explicitly rejects `__proto__`, `prototype`, `constructor`, and nested occurrences.
* Keep jQuery patched.
---
### JQ-CSP-001: CSP and Trusted Types SHOULD be used to make DOM XSS harder to introduce and exploit
Severity: Medium
Required:
* SHOULD deploy CSP as defense-in-depth against XSS. ([OWASP Cheat Sheet Series][9])
* If enabling Trusted Types (`require-trusted-types-for`), MUST ensure DOM injection goes through Trusted Types policies. ([MDN Web Docs][11])
* When using jQuery 4, SHOULD take advantage of its Trusted Types support (TrustedHTML inputs). ([blog.jquery.com][7])
Insecure patterns:
* “Fixing” a jQuery feature by weakening CSP (`script-src 'unsafe-inline'` / `'unsafe-eval'`) without a compensating plan.
* No CSP on applications that render user content or manipulate DOM heavily.
Detection hints:
* Look for CSP headers (server configs, framework middleware, meta tags).
* If not visible in repo, flag as “verify at edge/runtime”.
Fix:
* Add CSP incrementally; start by eliminating inline scripts and inline event handlers, then tighten `script-src`.
* Add Trusted Types where supported and feasible.
---
## 5) Practical scanning heuristics (how to “hunt”)
When actively scanning, use these high-signal patterns:
* jQuery version / sourcing:
* `jquery-*.js` in `vendor/` or `static/`
* `package.json` dependency `jquery` pinned to old versions
* CDN script tags lacking `integrity`/`crossorigin` ([jquery.com][8])
* HTML injection sinks (DOM XSS):
* `.html(`, `.append(`, `.prepend(`, `.before(`, `.after(`, `.replaceWith(`, `.wrap(`
* `$(` where argument might be HTML / template strings
* `$.parseHTML(` especially with `keepScripts=true` ([jQuery API][2])
* `.load(` (and whether selector is appended; script behavior differs) ([jQuery API][3])
* Script execution / dynamic code:
* `$.getScript(`, `dataType: "script"` ([jQuery API][4])
* `dataType: "jsonp"` or `jsonp:` usage; `callback=?` patterns ([jQuery API][5])
* `eval`, `new Function`, `setTimeout("…")`, `$.globalEval`
* Dangerous attribute writes:
* `.attr("href", …)`, `.attr("src", …)`, `.attr("style", …)`
* Any assignment of `javascript:`-like schemes or suspicious URL construction ([MDN Web Docs][6])
* Selector construction:
* `$("#" + user)` and similar; fix via `$.escapeSelector` ([jQuery API][20])
* Prototype pollution:
* `$.extend(true, …, userObj)`; ensure jQuery >= 3.4.0 and filter dangerous keys ([NVD][13])
* CSRF posture for AJAX:
* `$.post(` / `$.ajax({ method: ... })` with cookies and no CSRF token/header ([OWASP Cheat Sheet Series][19])
* Defense-in-depth:
* Absence of CSP/security headers in configs (or not visible; require runtime verification) ([OWASP Cheat Sheet Series][12])
Always try to confirm:
* data origin (untrusted vs trusted)
* sink type (HTML insertion / script execution / attribute / selector / object merge)
* protective controls present (sanitizer, allowlists, CSP, Trusted Types, CSRF validation)
---
## 6) Sources (accessed 2026-01-27)
Primary jQuery project documentation and release notes:
* jQuery 4.0.0 release notes (Trusted Types/CSP changes; version info): `https://blog.jquery.com/2026/01/17/jquery-4-0-0/`. ([blog.jquery.com][7])
* Download jQuery (latest version info; CDN + SRI guidance): `https://jquery.com/download/`. ([jquery.com][8])
* jQuery API: `.html()`: `https://api.jquery.com/html/`. ([jQuery API][21])
* jQuery API: `.text()`: `https://api.jquery.com/text/`. ([jQuery API][14])
* jQuery API: `.append()`: `https://api.jquery.com/append/`. ([jQuery API][22])
* jQuery API: `.load()` (script execution behavior): `https://api.jquery.com/load/`. ([jQuery API][3])
* jQuery API: `jQuery.parseHTML(…, keepScripts)`: `https://api.jquery.com/jQuery.parseHTML/`. ([jQuery API][2])
* jQuery API: `$.ajax()` (`jsonp: false` security note): `https://api.jquery.com/jQuery.ajax/`. ([jQuery API][5])
* jQuery API: `$.getScript()` (executes script): `https://api.jquery.com/jQuery.getScript/`. ([jQuery API][4])
* jQuery API: `jQuery.escapeSelector()`: `https://api.jquery.com/jQuery.escapeSelector/`. ([jQuery API][20])
jQuery vulnerabilities / advisories:
* NVD CVE-2019-11358 (prototype pollution; jQuery < 3.4.0): `https://nvd.nist.gov/vuln/detail/CVE-2019-11358`. ([NVD][13])
* NVD CVE-2020-11022 (XSS risk in DOM manipulation methods; patched in 3.5.0): `https://nvd.nist.gov/vuln/detail/CVE-2020-11022`. ([NVD][1])
* NVD CVE-2020-11023 (XSS risk involving `<option>`; patched in 3.5.0): `https://nvd.nist.gov/vuln/detail/CVE-2020-11023`. ([NVD][23])
* GitHub Security Advisory GHSA-gxr4-xjj5-5px2 (jQuery htmlPrefilter XSS; patched in 3.5.0): `https://github.com/jquery/jquery/security/advisories/GHSA-gxr4-xjj5-5px2`. ([GitHub][16])
OWASP Cheat Sheet Series (web app security foundations relevant to jQuery usage):
* XSS Prevention: `https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html`. ([OWASP Cheat Sheet Series][24])
* DOM-based XSS Prevention: `https://cheatsheetseries.owasp.org/cheatsheets/DOM_based_XSS_Prevention_Cheat_Sheet.html`. ([OWASP Cheat Sheet Series][25])
* CSRF Prevention: `https://cheatsheetseries.owasp.org/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.html`. ([OWASP Cheat Sheet Series][19])
* HTTP Security Headers: `https://cheatsheetseries.owasp.org/cheatsheets/HTTP_Headers_Cheat_Sheet.html`. ([OWASP Cheat Sheet Series][12])
* Content Security Policy Cheat Sheet: `https://cheatsheetseries.owasp.org/cheatsheets/Content_Security_Policy_Cheat_Sheet.html`. ([OWASP Cheat Sheet Series][9])
Browser/platform references (SRI, CSP, Trusted Types, and dangerous URL schemes):
* MDN: Subresource Integrity (SRI): `https://developer.mozilla.org/en-US/docs/Web/Security/Defenses/Subresource_Integrity`. ([MDN Web Docs][26])
* W3C: SRI specification: `https://www.w3.org/TR/sri-2/`. ([W3C][27])
* MDN: CSP guide: `https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/CSP`. ([MDN Web Docs][28])
* MDN: `require-trusted-types-for` directive: `https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Security-Policy/require-trusted-types-for`. ([MDN Web Docs][11])
* MDN: Trusted Types API: `https://developer.mozilla.org/en-US/docs/Web/API/Trusted_Types_API`. ([MDN Web Docs][29])
* W3C: Trusted Types specification: `https://www.w3.org/TR/trusted-types/`. ([W3C][10])
* MDN: `javascript:` URL scheme warning: `https://developer.mozilla.org/en-US/docs/Web/URI/Reference/Schemes/javascript`. ([MDN Web Docs][6])
* DOMPurify project documentation: `https://github.com/cure53/DOMPurify`. ([GitHub][17])
[1]: https://nvd.nist.gov/vuln/detail/cve-2020-11022?utm_source=chatgpt.com "CVE-2020-11022 Detail - NVD"
[2]: https://api.jquery.com/jQuery.parseHTML/?utm_source=chatgpt.com "jQuery.parseHTML()"
[3]: https://api.jquery.com/load/?utm_source=chatgpt.com ".load() | jQuery API Documentation"
[4]: https://api.jquery.com/jQuery.getScript/?utm_source=chatgpt.com "jQuery.getScript()"
[5]: https://api.jquery.com/jQuery.ajax/?utm_source=chatgpt.com "jQuery.ajax()"
[6]: https://developer.mozilla.org/en-US/docs/Web/URI/Reference/Schemes/javascript?utm_source=chatgpt.com "javascript: URLs - URIs - MDN Web Docs"
[7]: https://blog.jquery.com/2026/01/17/jquery-4-0-0/ "jQuery 4.0.0 | Official jQuery Blog"
[8]: https://jquery.com/download/ "Download jQuery | jQuery"
[9]: https://cheatsheetseries.owasp.org/cheatsheets/Content_Security_Policy_Cheat_Sheet.html?utm_source=chatgpt.com "Content Security Policy - OWASP Cheat Sheet Series"
[10]: https://www.w3.org/TR/trusted-types/?utm_source=chatgpt.com "Trusted Types"
[11]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Security-Policy/require-trusted-types-for?utm_source=chatgpt.com "Content-Security-Policy: require-trusted-types-for directive"
[12]: https://cheatsheetseries.owasp.org/cheatsheets/HTTP_Headers_Cheat_Sheet.html?utm_source=chatgpt.com "HTTP Security Response Headers Cheat Sheet"
[13]: https://nvd.nist.gov/vuln/detail/cve-2019-11358?utm_source=chatgpt.com "CVE-2019-11358 Detail - NVD"
[14]: https://api.jquery.com/text/?utm_source=chatgpt.com ".text() | jQuery API Documentation"
[15]: https://api.jquery.com/val/?utm_source=chatgpt.com ".val() | jQuery API Documentation"
[16]: https://github.com/jquery/jquery/security/advisories/GHSA-gxr4-xjj5-5px2 "Potential XSS vulnerability in jQuery.htmlPrefilter and related methods · Advisory · jquery/jquery · GitHub"
[17]: https://github.com/cure53/DOMPurify?utm_source=chatgpt.com "DOMPurify - a DOM-only, super-fast, uber-tolerant XSS ..."
[18]: https://developer.mozilla.org/en-US/docs/Web/API/HTML_Sanitizer_API?utm_source=chatgpt.com "HTML Sanitizer API - MDN Web Docs"
[19]: https://cheatsheetseries.owasp.org/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.html?utm_source=chatgpt.com "Cross-Site Request Forgery Prevention Cheat Sheet"
[20]: https://api.jquery.com/jQuery.escapeSelector/?utm_source=chatgpt.com "jQuery.escapeSelector()"
[21]: https://api.jquery.com/html/?utm_source=chatgpt.com ".html() | jQuery API Documentation"
[22]: https://api.jquery.com/append/?utm_source=chatgpt.com ".append() | jQuery API Documentation"
[23]: https://nvd.nist.gov/vuln/detail/cve-2020-11023?utm_source=chatgpt.com "CVE-2020-11023 Detail - NVD"
[24]: https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html?utm_source=chatgpt.com "Cross Site Scripting Prevention - OWASP Cheat Sheet Series"
[25]: https://cheatsheetseries.owasp.org/cheatsheets/DOM_based_XSS_Prevention_Cheat_Sheet.html?utm_source=chatgpt.com "DOM based XSS Prevention Cheat Sheet"
[26]: https://developer.mozilla.org/en-US/docs/Web/Security/Defenses/Subresource_Integrity?utm_source=chatgpt.com "Subresource Integrity - Security - MDN Web Docs"
[27]: https://www.w3.org/TR/sri-2/?utm_source=chatgpt.com "Subresource Integrity"
[28]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/CSP?utm_source=chatgpt.com "Content Security Policy (CSP) - HTTP - MDN Web Docs"
[29]: https://developer.mozilla.org/en-US/docs/Web/API/Trusted_Types_API?utm_source=chatgpt.com "Trusted Types API - MDN Web Docs"

View File

@@ -0,0 +1,990 @@
# React (JavaScript/TypeScript) Web Security Spec (React 19.x, TypeScript 5.x)
This document is designed as a **security spec** that supports:
1. **Secure-by-default code generation** for new React code.
2. **Security review / vulnerability hunting** in existing React code (passive “notice issues while working” and active “scan the repo and report findings”).
It is intentionally written as a set of **normative requirements** (“MUST/SHOULD/MAY”) plus **audit rules** (what bad patterns look like, how to detect them, and how to fix/mitigate them).
---
## 0) Safety, boundaries, and anti-abuse constraints (MUST FOLLOW)
* MUST NOT request, output, log, or commit secrets (API keys, OAuth client secrets, private keys, session cookies, JWTs, signing keys).
* Frontend note: anything shipped to the browser is observable by end users and attackers (view-source, devtools, proxies); never treat client code or “env vars in the bundle” as secret. ([create-react-app.dev][1])
* MUST NOT “fix” security by disabling protections (e.g., turning off CSP to “make it work”, adding `unsafe-inline`/`unsafe-eval` without a documented, constrained plan, disabling CSRF protections when using cookies, widening CORS, skipping sanitization, or “temporary” bypasses that ship). ([OWASP Cheat Sheet Series][2])
* MUST provide **evidence-based findings** during audits: cite file paths, code snippets, and configuration values that justify the claim.
* MUST treat uncertainty honestly: if a protection might exist in infra (CDN/WAF/reverse proxy), report it as “not visible in app code; verify via runtime headers / edge config”.
* MUST assume any data that crosses a trust boundary (URL, storage, network, postMessage, third-party scripts) can be attacker-influenced unless proven otherwise (see §2.1).
---
## 1) Operating modes
### 1.1 Generation mode (default)
When asked to write new React code or modify existing code:
* MUST follow every **MUST** requirement in this spec.
* SHOULD follow every **SHOULD** requirement unless the user explicitly says otherwise.
* MUST prefer safe-by-default APIs and proven libraries over custom security code.
* MUST avoid introducing new risky sinks (raw HTML insertion, direct DOM sinks like `innerHTML`, dynamic code execution, untrusted redirects/navigation, thirdparty script injection, unsafe token storage, etc.). ([MDN Web Docs][3])
### 1.2 Passive review mode (always on while editing)
While working anywhere in a React repo (even if the user did not ask for a security scan):
* MUST “notice” violations of this spec in touched/nearby code.
* SHOULD mention issues as they come up, with a brief explanation + safe fix.
### 1.3 Active audit mode (explicit scan request)
When the user asks to “scan”, “audit”, or “hunt for vulns”:
* MUST systematically search the codebase for violations of this spec.
* MUST output findings in a structured format (see §2.3).
Recommended audit order:
1. App entrypoints, build tooling (Vite/Webpack/CRA/Next), deployment configs, CDN/static hosting config.
2. Secrets & configuration exposure (env vars, runtime config injection, source maps).
3. Rendering of untrusted data (XSS/DOM XSS), especially `dangerouslySetInnerHTML`, markdown/HTML renderers, URL attributes.
4. Direct DOM usage and dangerous JS execution (`innerHTML`, `eval`, `new Function`, `document.write`, etc.).
5. Auth & session patterns (token storage, cookies, CSRF interactions, OAuth flows).
6. Network layer (axios/fetch wrappers, dynamic base URLs, credentialed requests, data exfil risks).
7. Navigation & redirect handling (open redirects, `window.location`, `target=_blank`, `window.open`).
8. Third-party scripts/tags/analytics and integrity controls (CSP, SRI).
9. Service worker/PWA behavior (HTTPS, caching rules, update strategy).
10. Security headers posture (CSP, clickjacking, nosniff, referrer policy) in app or at the edge. ([OWASP Cheat Sheet Series][2])
---
## 2) Definitions and review guidance
### 2.1 Untrusted input (treat as attacker-controlled unless proven otherwise)
Examples include:
* URL-derived data: `window.location`, query params, hash fragments, route params.
* Any data from browser storage: `localStorage`, `sessionStorage`, `IndexedDB` (including data previously written by the app—because XSS or extensions can tamper with it). ([OWASP Cheat Sheet Series][4])
* Any data from cross-window messaging: `window.postMessage` payloads. ([OWASP Cheat Sheet Series][4])
* Any data from remote APIs, webhooks proxied to the client, GraphQL responses, CMS content, feature flag services.
* Any persisted user content (profiles, comments, rich text, markdown) rendered in the UI.
* Any data produced by third-party scripts or tag managers (treat as untrusted unless strongly controlled). ([OWASP Cheat Sheet Series][5])
### 2.2 State-changing request (frontend perspective)
A request is state-changing if it can create/update/delete data, change auth/session state, trigger side effects (purchase, email send, webhook), or initiate privileged actions.
Frontend-specific note:
* State changes are often triggered by `fetch/axios` calls or form submissions. If authentication is cookie-based, these calls can be CSRF-relevant (§4 REACT-CSRF-001). ([OWASP Cheat Sheet Series][6])
### 2.3 Required audit finding format
For each issue found, output:
* Rule ID:
* Severity: Critical / High / Medium / Low
* Location: file path + component/function + line(s)
* Evidence: the exact code/config snippet
* Impact: what could go wrong, who can exploit it
* Fix: safe change (prefer minimal diff)
* Mitigation: defense-in-depth if immediate fix is hard
* False positive notes: what to verify if uncertain
---
## 3) Secure baseline: minimum production configuration (MUST in production)
This is the smallest “production baseline” that prevents common React frontend misconfigurations.
### 3.1 Production build and configuration hygiene (MUST)
* MUST ship a production build (minified, no dev-only overlays/tools, correct mode flags).
* MUST ensure build-time configuration does not embed secrets into the shipped JS/HTML/CSS. Build-time “environment variables” are not secret; treat them as public. ([create-react-app.dev][1])
* SHOULD treat source maps as sensitive operational artifacts:
* Either dont publish them publicly, or publish them only where intended (e.g., behind auth or to an error-reporting provider), because they can reveal code structure and internal URLs.
### 3.2 Browser-enforced protections (SHOULD, but baseline expectation for modern apps)
* SHOULD deploy a CSP as defense-in-depth against XSS, and keep it compatible with your React build (avoid `unsafe-inline` and `unsafe-eval` unless strictly necessary and documented). ([OWASP Cheat Sheet Series][2])
* SHOULD use Subresource Integrity (SRI) for any third-party script/style loaded from a CDN (or self-host instead). ([MDN Web Docs][7])
* SHOULD enable clickjacking defenses via `frame-ancestors` (CSP) and/or `X-Frame-Options`, unless embedding is an explicit product requirement. ([MDN Web Docs][8])
### 3.3 High-risk features baseline (MUST if used)
* If rendering any user-provided HTML/markdown/rich text:
* MUST sanitize before insertion and avoid raw DOM sinks. ([OWASP Cheat Sheet Series][9])
* If using service workers / PWA:
* MUST serve over HTTPS and implement a safe caching/update strategy (service workers are powerful request/response proxies). ([MDN Web Docs][10])
---
## 4) Rules (generation + audit)
Each rule contains: required practice, insecure patterns, detection hints, and remediation.
### REACT-CONFIG-001: Never embed secrets in the client bundle (env vars are public)
Severity: Critical (if secrets exposed)
Required:
* MUST NOT place secrets in React code, in `public/` assets, or in build-time environment variables intended for client consumption.
* MUST assume any value available to the React app at runtime can be extracted by an attacker.
Insecure patterns:
* Using build-time env vars for secrets:
* `process.env.REACT_APP_*` containing private keys or credentials.
* `import.meta.env.VITE_*` containing secrets.
* Hard-coded secrets in JS/TS, `.env` committed, or secrets in `public/config.json` served to all users.
Detection hints:
* Search for:
* `REACT_APP_`, `VITE_`, `NEXT_PUBLIC_`, `process.env.`, `import.meta.env.`
* `apiKey`, `secret`, `token`, `private`, `password`, `client_secret`
* Inspect `public/` for runtime config JSON.
Fix:
* Move secrets server-side (API, BFF, serverless function).
* Use a backend to mint short-lived, scoped tokens if the browser needs to call third-party APIs.
Notes:
* CRA explicitly warns not to store secrets and notes env vars are embedded into the build and visible to anyone inspecting files. ([create-react-app.dev][1])
* Vite explicitly notes that variables exposed to client code end up in the client bundle and should not contain sensitive info. ([vitejs][11])
---
### REACT-XSS-001: Do not use `dangerouslySetInnerHTML` with untrusted content (sanitize or avoid)
Severity: High (Only if you can prove attacker-controlled HTML reaches it)
Required:
* MUST avoid `dangerouslySetInnerHTML` unless absolutely necessary.
* If it must be used:
* MUST sanitize untrusted HTML with a proven sanitizer (e.g., DOMPurify) and an allowlist-oriented configuration.
* MUST keep the sanitization logic centralized and heavily reviewed.
* SHOULD add a CSP and consider Trusted Types (see REACT-TT-001).
Insecure patterns:
* `<div dangerouslySetInnerHTML={{ __html: userHtml }} />` where `userHtml` is from API/URL/storage.
* “Sanitization” done with regexes, ad-hoc stripping, or incomplete allowlists.
Detection hints:
* Grep: `dangerouslySetInnerHTML`, `__html:`
* Trace the origin of the HTML string (API/CMS/URL/localStorage).
Fix:
* Replace with safe rendering:
* Render structured data as React elements/components instead of HTML strings.
* If rich text is required, sanitize with DOMPurify (or equivalent) and render the sanitized output.
* Add CSP; remove dangerous sinks where possible.
Notes:
* React explicitly warns that `dangerouslySetInnerHTML` is dangerous and can introduce XSS if misused. ([React][12])
* OWASP explicitly calls out Reacts `dangerouslySetInnerHTML` without sanitization as a common framework “escape hatch” pitfall. ([OWASP Cheat Sheet Series][9])
* DOMPurify describes itself as an XSS sanitizer for HTML/SVG/MathML. ([GitHub][13])
---
### REACT-XSS-002: Rely on Reacts escaping-by-default behavior; do not bypass it
Severity: High (when bypassed)
Required:
* MUST render untrusted strings via normal JSX interpolation (`{value}`) and React props, which are escaped by default.
* MUST NOT build HTML strings from untrusted data and then inject them into the DOM via any means.
* SHOULD treat any “escape hatch” as high risk and require review.
Insecure patterns:
* Converting untrusted text into HTML and injecting it:
* `element.innerHTML = userValue`
* `document.write(userValue)`
* `insertAdjacentHTML(..., userValue)`
Detection hints:
* Grep for DOM sinks: `innerHTML`, `outerHTML`, `insertAdjacentHTML`, `document.write`, `DOMParser`, `createContextualFragment`.
Fix:
* Render text content through React (JSX) so it is escaped.
* If you truly need HTML, sanitize and apply REACT-XSS-001 + REACT-TT-001.
Notes:
* React documentation (JSX) states that React DOM escapes values embedded in JSX before rendering to help prevent injection attacks. ([React][14])
---
### REACT-DOM-001: Avoid DOM XSS injection sinks in React code (use safe alternatives)
Severity: High
Required:
* MUST avoid direct DOM injection sinks, even outside React rendering, unless strongly controlled.
* If a DOM sink is required:
* MUST ensure inputs are trusted/validated/sanitized.
* SHOULD enforce Trusted Types (REACT-TT-001).
Insecure patterns:
* `someEl.innerHTML = untrusted`
* `document.write(untrusted)`
* `new DOMParser().parseFromString(untrusted, 'text/html')` followed by insertion
Detection hints:
* Grep for: `innerHTML`, `outerHTML`, `document.write`, `DOMParser`, `Range().createContextualFragment`, `insertAdjacentHTML`
Fix:
* Prefer:
* `textContent` for text insertion.
* React rendering rather than manual DOM manipulation.
* A vetted sanitizer for any required HTML parsing.
Notes:
* Trusted Types documentation defines HTML sinks like `Element.innerHTML` and `document.write()` as injection sinks that can execute script when given attacker-controlled input. ([MDN Web Docs][3])
* OWASP HTML5 guidance recommends using `textContent` instead of `innerHTML` for assigning untrusted data. ([OWASP Cheat Sheet Series][4])
---
### REACT-URL-001: Validate and constrain untrusted URLs used in `href`, `src`, navigation, and redirects
Severity: High Only when you can prove they are attacker controlled
Required:
* MUST treat any URL derived from untrusted input as dangerous.
* MUST allowlist schemes and (when applicable) hosts:
* Typically allow only `https:` (and maybe `http:` for localhost/dev) and relative URLs for in-app navigation.
* MUST explicitly block `javascript:` and dangerous `data:` uses unless you have specialized validation and a clear use case.
* SHOULD prefer same-site relative paths (e.g., `/settings`) over absolute URLs.
* MUST validate “returnTo/next/redirect” parameters (see REACT-REDIRECT-001).
Insecure patterns:
* `<img src={userProvidedUrl}>...` (can be used for tracking / data exfil; also risky if used for scripts/iframes)
* `window.location = next`
* `navigate(next)` where `next` comes from query params without validation
Detection hints:
* Search for:
* `href={`, `src={`, `window.location`, `location.href`, `window.open`, `navigate(`, `redirectTo`, `returnTo`, `next=`
* Track whether the value is derived from URL/query/storage/API.
Fix:
* Implement a shared `safeUrl()` utility:
* Parse with `new URL(value, base)`
* Enforce scheme allowlist and host allowlist (or enforce same-origin)
* For redirects: allow only relative paths (starting with `/`) or a strict allowlist of absolute origins.
* Fall back to a safe default when validation fails.
Notes:
* OWASP explicitly notes Reacts `dangerouslySetInnerHTML` risk and also states React cannot safely handle `javascript:` or `data:` URLs without specialized validation. ([OWASP Cheat Sheet Series][9])
---
### REACT-MARKUP-001: Markdown / rich text rendering must be configured safely
Severity: Medium
Required:
* MUST assume markdown/rich text can be attacker-controlled if it comes from users or CMS.
* MUST ensure raw HTML is not rendered unless sanitized.
* SHOULD prefer markdown renderers that:
* Do not allow raw HTML by default, or
* Can be configured to disallow raw HTML, or
* Sanitize HTML output before rendering.
Insecure patterns:
* Markdown rendering with “raw HTML passthrough” enabled (e.g., options/plugins that allow HTML).
* Rendering user-provided SVG/MathML/HTML inline without sanitization.
Detection hints:
* Search for common libraries and risky options:
* `marked`, `markdown-it`, `react-markdown`, `rehype-raw`, `sanitize: false`, `allowDangerousHtml`, etc.
* Look for `dangerouslySetInnerHTML` used with “markdown output”.
Fix:
* Disable raw HTML passthrough.
* Sanitize output with a proven sanitizer (e.g., DOMPurify) before rendering.
Notes:
* OWASP XSS guidance emphasizes that framework escape hatches require output encoding and/or HTML sanitization. ([OWASP Cheat Sheet Series][9])
---
### REACT-TT-001: Use Trusted Types (with CSP) to harden DOM XSS sinks where feasible
Severity: Low
Required:
* SHOULD consider enabling Trusted Types in report-only mode first, then enforce once violations are addressed.
* SHOULD centralize Trusted Types policies and treat them as high-risk code requiring review.
* MUST NOT create permissive policies that simply “pass through” untrusted strings.
Insecure patterns:
* A Trusted Types policy that returns the raw string without sanitization for HTML sinks.
* Many scattered policies across the codebase (hard to audit).
Detection hints:
* Search for:
* `trustedTypes.createPolicy`
* CSP directives: `require-trusted-types-for`, `trusted-types`
* Search for remaining DOM sinks (REACT-DOM-001).
Fix:
* Implement a small number of tightly scoped policies:
* HTML policy uses sanitizer (DOMPurify or equivalent).
* Script URL policy uses strict allowlists.
* Run in report-only mode, fix violations, then enforce.
Notes:
* MDN describes Trusted Types as a way to ensure input is transformed (commonly sanitized) before being passed to injection sinks, and highlights HTML sinks (`innerHTML`, `document.write`) and JS URL sinks (`script.src`). ([MDN Web Docs][3])
* The W3C Trusted Types spec frames this as reducing DOM XSS risk by locking down sinks to typed values created by reviewed policies. ([W3C][15])
---
### REACT-CSP-001: Deploy and maintain a CSP as defense-in-depth (especially when rendering untrusted content)
Severity: Medium to High
Required:
* SHOULD deploy CSP in production; MUST do so for apps that render untrusted content or integrate third-party scripts.
* SHOULD avoid `unsafe-inline` and `unsafe-eval` when possible.
* SHOULD use CSP nonces/hashes for inline scripts if needed, and keep policy realistic.
* SHOULD use CSP to require/encourage SRI where appropriate.
Insecure patterns:
* No CSP at all on the app shell (SPA entry HTML).
* CSP that relies on `unsafe-inline`/`unsafe-eval` broadly without justification.
* `script-src *` or overly broad sources.
Detection hints:
* Look for CSP configuration:
* Server/CDN config, headers in `index.html` responses, or framework config.
* If absent in repo, mark as “verify at edge”.
Fix:
* Add CSP via HTTP response headers (preferred).
* Start with report-only to reduce breakage, then enforce.
Notes:
* OWASP describes CSP as “defense in depth” against XSS and notes it can help enforce SRI even on static sites, but should not be the only defense. ([OWASP Cheat Sheet Series][2])
---
### REACT-SRI-001: Use Subresource Integrity (SRI) for third-party scripts and styles (or self-host)
Severity: Low
Required:
* MUST treat third-party JS as equivalent to running arbitrary code in your origin.
* If loading from a CDN or third party:
* SHOULD use SRI (`integrity=...`) and `crossorigin` where applicable.
* SHOULD pin exact versions (avoid “latest” URLs).
* SHOULD prefer self-hosting for critical code.
Insecure patterns:
* `<script src="https://cdn.example.com/lib/latest.js"></script>` with no integrity.
* Tag managers that dynamically load arbitrary scripts without governance.
Detection hints:
* Search in `public/index.html`, templates, or SSR wrappers for:
* `<script src=`, `<link rel="stylesheet" href=`
* Tag manager snippets (GTM, Segment, etc.)
* Identify scripts loaded dynamically in runtime JS.
Fix:
* Add SRI hashes for stable third-party assets or self-host.
* Apply governance controls for tag managers (see REACT-3P-001).
Notes:
* MDN describes SRI as a security feature enabling browsers to verify fetched resources (e.g., from a CDN) havent been manipulated by checking a cryptographic hash. ([MDN Web Docs][7])
* OWASP CSP guidance notes CSP can enforce SRI and is useful even on static sites. ([OWASP Cheat Sheet Series][2])
---
### REACT-3P-001: Third-party JavaScript and tag managers must be minimized and governed
Severity: High
Required:
* MUST minimize third-party scripts and treat each as a supply-chain risk.
* MUST know exactly what third-party JS executes in your origin and why.
* SHOULD implement governance:
* Review and pin versions (or mirror in-house).
* Restrict data access (data-layer approach).
* Use SRI and CSP; consider sandboxing untrusted UI in iframes where possible.
Insecure patterns:
* Unreviewed analytics/ads scripts running with full access to DOM, cookies, storage, and user data.
* Tag managers that can be changed by non-engineering roles with no change control.
Detection hints:
* Search for common vendor snippets in HTML/JS:
* GTM, Segment, Hotjar, FullStory, etc.
* Look for dynamic script insertion:
* `document.createElement('script')`, `.src = ...`, `.appendChild(script)`
Fix:
* Reduce to only necessary vendors.
* Where feasible:
* Self-host or mirror scripts.
* Use SRI.
* Limit data exposure via a controlled data layer.
Notes:
* OWASP notes third-party JS server compromise can inject malicious JS, and highlights risks like arbitrary code execution and disclosure of sensitive info to third parties. ([OWASP Cheat Sheet Series][5])
---
### REACT-AUTH-001: Token and session handling must be resilient to XSS (avoid sensitive storage in Web Storage)
Severity: Medium
Required:
* SHOULD avoid storing session identifiers or long-lived tokens in `localStorage` (and generally in Web Storage) because XSS can exfiltrate them.
* If tokens must exist client-side:
* SHOULD prefer in-memory storage with short lifetimes and refresh mechanisms.
* MUST scope and rotate tokens; avoid long-lived bearer tokens in persistent storage.
* SHOULD prefer HTTPOnly cookies for session tokens when possible (requires CSRF strategy: see REACT-CSRF-001).
Insecure patterns:
* `localStorage.setItem('token', ...)` / `sessionStorage.setItem('token', ...)` for auth tokens.
* Persisting refresh tokens in `localStorage`.
* Treating data from Web Storage as trusted.
Detection hints:
* Grep for: `localStorage.`, `sessionStorage.`, `setItem(`, `getItem(`, `token`, `jwt`, `refresh`
* Search auth code for “remember me” storing tokens persistently.
Fix:
* Move to HTTPOnly cookies (server change) + CSRF protections, or use short-lived in-memory tokens.
* Reduce token scope and lifetime.
Notes:
* OWASP HTML5 guidance recommends avoiding sensitive info and session identifiers in local storage and warns that a single XSS can steal all data in Web Storage. ([OWASP Cheat Sheet Series][4])
* OAuth browser-based apps guidance discusses that tokens stored in persistent browser storage like localStorage can be accessible to malicious JS (e.g., via XSS). ([IETF Datatracker][16])
---
### REACT-CSRF-001: Cookie-authenticated, state-changing requests MUST be CSRF-protected
Severity: High
NOTE: If the application does not use cookie based auth (using Authentication header for example), then CSRF is not a concern.
Required:
* If the app relies on cookies for authentication:
* MUST protect state-changing requests (POST/PUT/PATCH/DELETE) against CSRF.
* SHOULD include a CSRF token mechanism (synchronizer token or double-submit cookie) or other robust pattern appropriate to the backend.
* SHOULD use SameSite cookies as defense-in-depth, not as the sole defense.
Insecure patterns:
* `fetch('/api/transfer', { method: 'POST', credentials: 'include' })` with no CSRF token/header, relying only on cookies.
* Using GET for state-changing operations.
Detection hints:
* Enumerate state-changing network calls and check:
* Is `credentials: 'include'` or `withCredentials: true` used?
* Is a CSRF token header included (e.g., `X-CSRF-Token`)?
* Search for “csrf” utilities; if absent, treat as suspicious.
Fix:
* Add CSRF token flow:
* Fetch token from a safe endpoint and attach to state-changing requests.
* Validate server-side.
* Keep SameSite cookies and Origin/Referer validation as defense-in-depth.
Notes:
* OWASP CSRF guidance explains SameSite behavior (Lax/Strict/None) as a defense-in-depth technique and why Lax is often the usability/security balance, but it is not a complete substitute for CSRF protections. ([OWASP Cheat Sheet Series][6])
---
### REACT-AUTHZ-001: Do not rely on frontend-only authorization
Severity: High (only if used as primary protection)
Required:
* MUST treat all frontend authorization checks as UX only.
* MUST enforce authorization on the server for any protected resource or action.
Insecure patterns:
* “Protected” actions hidden in UI but callable by API without server checks.
* Client checks like `if (user.isAdmin) { showAdminPanel(); }` with no server-side enforcement.
Detection hints:
* Look for UI gating around sensitive actions and verify server endpoints enforce authorization.
* In a frontend-only audit, report as “client checks are not security; verify backend”.
Fix:
* Add/confirm server-side authorization checks.
* Keep frontend gating only as convenience.
Notes:
* This is a general web app security property; React cannot protect server resources by itself.
---
### REACT-NET-001: Prevent data exfiltration and credential leakage via dynamic outbound requests
Severity: Medium to High
Required:
* MUST avoid making authenticated requests to attacker-controlled origins.
* SHOULD avoid allowing user input to control request destination (scheme/host/port).
* SHOULD centralize network clients (fetch/axios) with:
* fixed `baseURL` (or strict allowlist),
* strict handling of redirects,
* explicit `credentials` usage.
Insecure patterns:
* `fetch(userProvidedUrl, { credentials: 'include' })`
* `axios.create({ baseURL: userProvidedBase })`
* “URL fetch/preview” features in the client that hit arbitrary domains with sensitive headers.
Detection hints:
* Search for `fetch(` / `axios(` where the first argument or `baseURL` is derived from:
* query params, localStorage, API responses, postMessage
* Search for `credentials: 'include'`, `withCredentials: true`.
Fix:
* Enforce destination allowlists; disallow cross-origin requests unless explicitly required.
* Strip credentials/Authorization headers for any non-allowlisted destination.
Notes:
* Even if the browser limits some cross-origin behavior, leaking tokens/headers to untrusted endpoints is still a common failure mode.
---
### REACT-REDIRECT-001: Prevent open redirects and untrusted navigation
Severity: Medium
Required:
* MUST validate redirect/navigation targets derived from untrusted input (`next`, `returnTo`, `redirect`).
* SHOULD only allow same-site relative paths, or a strict allowlist of trusted origins for absolute URLs.
Insecure patterns:
* `window.location.href = new URLSearchParams(location.search).get('next')`
* `navigate(next)` where `next` comes from query params.
Detection hints:
* Search for: `next`, `returnTo`, `redirect`, `window.location`, `navigate(`
* Trace origin of the redirect target.
Fix:
* Only allow relative paths (`/^\/[^\s]*$/`) or allowlisted origins.
* Fall back to a safe default (e.g., `/`) when invalid.
Notes:
* Open redirects are frequently used in phishing and can undermine SSO/OAuth flows.
---
### REACT-SW-001: Service workers are high-privilege; require HTTPS and safe caching/update rules
Severity: Medium
Required:
* MUST serve service workers over HTTPS (except `localhost` dev), and deploy only in secure contexts.
* MUST avoid caching sensitive authenticated API responses unless explicitly designed and threat-modeled.
* SHOULD implement safe update strategy (prompt reload, versioned caches, remove old caches on activate).
Insecure patterns:
* Registering a service worker for an authenticated app and caching “everything” indiscriminately.
* Long-lived caches containing PII or user-specific content shared across accounts.
Detection hints:
* Search for:
* `navigator.serviceWorker.register`
* `workbox`, `precacheAndRoute`, custom `fetch` handlers
* Inspect caching patterns (`caches.open`, `cache.put`, `respondWith`).
Fix:
* Restrict caching to static assets only (JS/CSS/images) unless you have a designed offline model.
* Ensure cache keys are user-scoped if user-specific data must be cached.
* Provide a clear update mechanism.
Notes:
* MDN notes service workers require HTTPS for security reasons and act like a proxy for requests/responses. ([MDN Web Docs][10])
* “Secure contexts” exist to prevent MITM attackers from accessing powerful APIs; service workers are an example of such a powerful feature. ([MDN Web Docs][18])
---
### REACT-HEADERS-001: Ensure essential security headers are set for the React app shell (app or edge)
Severity: Medium
Required (typical SPA served from an origin):
* SHOULD set:
* CSP (`Content-Security-Policy`)
* `X-Content-Type-Options: nosniff`
* Clickjacking protection (`frame-ancestors` in CSP and/or `X-Frame-Options`)
* `Referrer-Policy`
* `Permissions-Policy` as appropriate
* MUST ensure these are set somewhere (CDN/edge/server), even if not in repo.
Insecure patterns:
* No security headers anywhere (app or edge).
* CSP missing on apps that render untrusted content or use third-party scripts.
Detection hints:
* Check server/CDN config in repo (nginx, Cloudflare, Vercel config, etc.).
* If absent, flag as “verify at runtime/edge”.
Fix:
* Set headers centrally at the edge.
* Keep CSP realistic and iterative (report-only → enforce).
Notes:
* MDN clickjacking guidance discusses defenses including `X-Frame-Options` and CSP `frame-ancestors`. ([MDN Web Docs][8])
* OWASP CSP guidance explains delivery via response headers and recommends headers as the preferred mechanism. ([OWASP Cheat Sheet Series][2])
---
### REACT-POSTMSG-001: `postMessage` must validate origin and treat payload as untrusted data
Severity: Medium to High (depends on what messages can do)
Required:
* MUST specify exact `targetOrigin` when sending messages (not `*`) unless there is a strict reason.
* MUST validate `event.origin` on receipt and validate message shape.
* MUST NOT evaluate message data as code or insert it into the DOM as HTML.
Insecure patterns:
* `window.postMessage(data, '*')` to unknown targets.
* Receiving:
* `window.addEventListener('message', (e) => { eval(e.data) })`
* `element.innerHTML = e.data`
Detection hints:
* Search: `postMessage(`, `addEventListener('message'`
* Check for origin checks and safe handling.
Fix:
* Add strict origin allowlists and schema validation (e.g., zod).
* Treat message payload strictly as data; render safely via React.
Notes:
* OWASP HTML5 guidance recommends specifying expected origin for `postMessage`, checking sender origin, validating data, and avoiding eval/innerHTML with message content. ([OWASP Cheat Sheet Series][4])
---
### REACT-FILE-001: File uploads and previews must not create client-side active content vulnerabilities
Severity: Medium (can be High if stored-XSS possible)
Required:
* MUST treat user-uploaded files and previews as potentially malicious.
* MUST NOT render uploaded HTML/SVG/other active content inline unless sanitized and explicitly required.
* SHOULD validate file types client-side for UX, but MUST rely on server-side validation for security.
Insecure patterns:
* Rendering user-uploaded HTML as content.
* Inline rendering of untrusted SVG/HTML via `dangerouslySetInnerHTML` or `<iframe srcdoc=...>` without sanitization.
Detection hints:
* Search for upload components and preview logic:
* `input type="file"`, `FileReader`, `URL.createObjectURL`, `<iframe>`, `<object>`, `<embed>`.
* Trace where uploaded content is later displayed.
Fix:
* Restrict accepted types, sanitize where needed, and prefer download/attachment flows for risky types.
* Ensure server enforces the real policy (type checking, renaming, scanning, storing outside webroot).
Notes:
* OWASP file upload guidance highlights allowlisting extensions, validating file type, generating filenames, limiting size, storing outside webroot, and considering “client-side active content (XSS, CSRF, etc.)” when files are publicly retrievable. ([OWASP Cheat Sheet Series][19])
---
### REACT-SUPPLY-001: Dependency and supply-chain hygiene (frontend + build tooling)
Severity: Low
Required:
* MUST use a lockfile and enforce reproducible installs in CI.
* SHOULD regularly audit dependencies and respond quickly to advisories for:
* React, react-dom, router libs, build tooling (Vite/Webpack), sanitizers, auth libs, etc.
* SHOULD reduce exposure to install-time script attacks and typosquatting risk.
Audit focus:
* CI should use `npm ci` (or Yarn frozen lockfile / pnpm equivalent) to prevent drift.
* Use vulnerability scanning (`npm audit`, GitHub Dependabot/alerts, etc.).
Insecure patterns:
* No lockfile or lockfile ignored in CI.
* `npm install` in CI producing non-reproducible builds.
* Unpinned or unreviewed high-risk deps; sudden major updates without review.
* Blindly running install scripts from third-party packages.
Detection hints:
* Check for lockfiles: `package-lock.json`, `yarn.lock`, `pnpm-lock.yaml`.
* Check CI scripts for `npm install` vs `npm ci`.
* Search for `postinstall` scripts and suspicious build steps.
Fix:
* Use lockfile and enforce it in CI (e.g., `npm ci`).
* Run audits regularly; pin/upgrade responsibly.
* Consider restricting install scripts where feasible.
Notes:
* npm docs describe `npm audit` as submitting the project dependency tree to the registry to receive a report of known vulnerabilities and (optionally) applying remediations via `npm audit fix`, while noting some vulns require manual review. ([npm Docs][20])
* npm docs describe `npm ci` as intended for automated/CI environments, requiring an existing lockfile and failing if `package.json` and lockfile do not match. ([npm Docs][21])
* OWASP NPM security guidance recommends enforcing the lockfile and explicitly calls out `npm ci` / `yarn install --frozen-lockfile` to abort on inconsistencies, and highlights the risk of install-time scripts and the option to use `--ignore-scripts` to reduce attack surface. ([OWASP Cheat Sheet Series][22])
---
## 5) Practical scanning heuristics (how to “hunt”)
When actively scanning, use these high-signal patterns:
* Raw HTML / XSS escape hatches:
* `dangerouslySetInnerHTML`, `__html:`
* Markdown HTML passthrough flags: `rehype-raw`, `allowDangerousHtml`, `sanitize: false`
* DOM XSS sinks:
* `innerHTML`, `outerHTML`, `insertAdjacentHTML`, `document.write`, `DOMParser`, `createContextualFragment`
* Dangerous JS execution:
* `eval(`, `new Function(`, `setTimeout("`, `setInterval("`
* Untrusted URL injection / navigation:
* `href={` / `src={` with untrusted values
* `window.location`, `location.href`, `window.open`, `navigate(`
* Query params: `next`, `returnTo`, `redirect`
* Token/session risk:
* `localStorage.setItem`, `sessionStorage.setItem`, `getItem(` with `token`, `jwt`, `refresh`
* Cookie/CSRF coupling:
* `credentials: 'include'`, `withCredentials: true` on state-changing requests without CSRF headers
* Third-party scripts:
* `<script src=...>` in `public/index.html`
* Tag manager snippets and dynamic script insertion
* Service workers:
* `navigator.serviceWorker.register`, Workbox usage, custom `fetch` handlers
* postMessage:
* `postMessage(` with `*`, missing `event.origin` checks
* Supply chain:
* Missing lockfile, CI uses `npm install`, no audit step, risky postinstall scripts
Always try to confirm:
* data origin (untrusted vs trusted)
* sink type (React escape hatch vs DOM sink vs navigation vs storage)
* protective controls present (sanitization, allowlists, CSP/Trusted Types, CSRF tokens, headers, governance)
---
## 6) Sources (accessed 2026-01-26)
Primary React documentation:
* React 19 stable announcement — `https://react.dev/blog/2024/12/05/react-19` ([React][23])
* React DOM docs: `dangerouslySetInnerHTML` warning — `https://react.dev/reference/react-dom/components/common#dangerouslysetting-the-inner-html` ([React][12])
* React (legacy) JSX escaping statement — `https://legacy.reactjs.org/docs/introducing-jsx.html` ([React][14])
OWASP Cheat Sheet Series:
* Cross Site Scripting Prevention (framework escape hatches; React `dangerouslySetInnerHTML`; URL validation notes) — `https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html` ([OWASP Cheat Sheet Series][9])
* Content Security Policy — `https://cheatsheetseries.owasp.org/cheatsheets/Content_Security_Policy_Cheat_Sheet.html` ([OWASP Cheat Sheet Series][2])
* Cross-Site Request Forgery Prevention — `https://cheatsheetseries.owasp.org/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.html` ([OWASP Cheat Sheet Series][6])
* HTML5 Security (Web Storage, postMessage, tabnabbing, sandboxed frames) — `https://cheatsheetseries.owasp.org/cheatsheets/HTML5_Security_Cheat_Sheet.html` ([OWASP Cheat Sheet Series][4])
* Third Party JavaScript Management — `https://cheatsheetseries.owasp.org/cheatsheets/Third_Party_Javascript_Management_Cheat_Sheet.html` ([OWASP Cheat Sheet Series][5])
* File Upload — `https://cheatsheetseries.owasp.org/cheatsheets/File_Upload_Cheat_Sheet.html` ([OWASP Cheat Sheet Series][19])
* NPM Security best practices — `https://cheatsheetseries.owasp.org/cheatsheets/NPM_Security_Cheat_Sheet.html` ([OWASP Cheat Sheet Series][22])
Browser / platform references (MDN, W3C):
* Trusted Types API — `https://developer.mozilla.org/en-US/docs/Web/API/Trusted_Types_API` ([MDN Web Docs][3])
* W3C Trusted Types spec — `https://www.w3.org/TR/trusted-types/` ([W3C][15])
* Subresource Integrity — `https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity` ([MDN Web Docs][7])
* Clickjacking defenses overview — `https://developer.mozilla.org/en-US/docs/Web/Security/Attacks/Clickjacking` ([MDN Web Docs][8])
* Using Service Workers (HTTPS requirement; proxy-like behavior) — `https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API/Using_Service_Workers` ([MDN Web Docs][10])
* Secure contexts (powerful APIs restricted to HTTPS) — `https://developer.mozilla.org/en-US/docs/Web/Security/Defenses/Secure_Contexts` ([MDN Web Docs][18])
* Link `rel` values (noopener/noreferrer) — `https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes/rel` ([MDN Web Docs][17])
Build tooling / env exposure references:
* Create React App env variables warning — `https://create-react-app.dev/docs/adding-custom-environment-variables/` ([create-react-app.dev][1])
* Vite env variables security notes — `https://vite.dev/guide/env-and-mode` ([vitejs][11])
Auth/token storage guidance:
* OAuth 2.0 for Browser-Based Apps (token storage discussion) — `https://datatracker.ietf.org/doc/html/draft-ietf-oauth-browser-based-apps` ([IETF Datatracker][16])
Dependency tooling references:
* npm audit docs — `https://docs.npmjs.com/cli/v10/commands/npm-audit/` ([npm Docs][20])
* npm ci docs — `https://docs.npmjs.com/cli/v10/commands/npm-ci/` ([npm Docs][21])
Sanitizer reference:
* DOMPurify — `https://github.com/cure53/DOMPurify` ([GitHub][13])
[1]: https://create-react-app.dev/docs/adding-custom-environment-variables/ "Adding Custom Environment Variables | Create React App"
[2]: https://cheatsheetseries.owasp.org/cheatsheets/Content_Security_Policy_Cheat_Sheet.html "Content Security Policy - OWASP Cheat Sheet Series"
[3]: https://developer.mozilla.org/en-US/docs/Web/API/Trusted_Types_API "Trusted Types API - Web APIs | MDN"
[4]: https://cheatsheetseries.owasp.org/cheatsheets/HTML5_Security_Cheat_Sheet.html "HTML5 Security - OWASP Cheat Sheet Series"
[5]: https://cheatsheetseries.owasp.org/cheatsheets/Third_Party_Javascript_Management_Cheat_Sheet.html "Third Party Javascript Management - OWASP Cheat Sheet Series"
[6]: https://cheatsheetseries.owasp.org/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.html "Cross-Site Request Forgery Prevention - OWASP Cheat Sheet Series"
[7]: https://developer.mozilla.org/en-US/docs/Web/Security/Defenses/Subresource_Integrity "Subresource Integrity - Security | MDN"
[8]: https://developer.mozilla.org/en-US/docs/Web/Security/Attacks/Clickjacking "Clickjacking - Security | MDN"
[9]: https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html "Cross Site Scripting Prevention - OWASP Cheat Sheet Series"
[10]: https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API/Using_Service_Workers "Using Service Workers - Web APIs | MDN"
[11]: https://vite.dev/guide/env-and-mode "Env Variables and Modes | Vite"
[12]: https://react.dev/reference/react-dom/components/common "Common components (e.g. <div>) React"
[13]: https://github.com/cure53/DOMPurify "GitHub - cure53/DOMPurify: DOMPurify - a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. DOMPurify works with a secure default, but offers a lot of configurability and hooks. Demo:"
[14]: https://legacy.reactjs.org/docs/introducing-jsx.html "Introducing JSX React"
[15]: https://www.w3.org/TR/trusted-types/ "Trusted Types"
[16]: https://datatracker.ietf.org/doc/html/draft-ietf-oauth-browser-based-apps "
draft-ietf-oauth-browser-based-apps-26
"
[17]: https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Attributes/rel "HTML attribute: rel - HTML | MDN"
[18]: https://developer.mozilla.org/en-US/docs/Web/Security/Defenses/Secure_Contexts "Secure contexts - Security | MDN"
[19]: https://cheatsheetseries.owasp.org/cheatsheets/File_Upload_Cheat_Sheet.html "File Upload - OWASP Cheat Sheet Series"
[20]: https://docs.npmjs.com/cli/v10/commands/npm-audit "npm-audit | npm Docs"
[21]: https://docs.npmjs.com/cli/v10/commands/npm-ci "npm-ci | npm Docs"
[22]: https://cheatsheetseries.owasp.org/cheatsheets/NPM_Security_Cheat_Sheet.html "NPM Security - OWASP Cheat Sheet Series"
[23]: https://react.dev/blog/2024/12/05/react-19 "React v19 React"

View File

@@ -0,0 +1,791 @@
# Vue.js Web Security Spec (Vue 3.x, TypeScript/JavaScript, common tooling: Vite)
This document is designed as a **security spec** that supports:
1. **Secure-by-default code generation** for new Vue code.
2. **Security review / vulnerability hunting** in existing Vue code (passive “notice issues while working” and active “scan the repo and report findings”).
It is intentionally written as a set of **normative requirements** (“MUST/SHOULD/MAY”) plus **audit rules** (what bad patterns look like, how to detect them, and how to fix/mitigate them).
---
## 0) Safety, boundaries, and anti-abuse constraints (MUST FOLLOW)
* MUST NOT request, output, log, or commit secrets (API keys, passwords, private keys, session cookies, auth tokens).
* MUST NOT “fix” security by disabling protections (e.g., weakening CSP, turning on unsafe template compilation, using `v-html` as a shortcut, bypassing backend auth, or “just store the token in localStorage”).
* MUST provide **evidence-based findings** during audits: cite file paths, code snippets, and configuration values that justify the claim.
* MUST treat uncertainty honestly: if a protection might exist at the edge (CDN, reverse proxy, WAF, server headers), report it as “not visible in repo; verify runtime/infra config”.
* MUST remember the frontend trust model: **any code shipped to browsers is attacker-readable and attacker-modifiable**. Secrets and “security enforcement” cannot rely on frontend-only logic.
---
## 1) Operating modes
### 1.1 Generation mode (default)
When asked to write new Vue code or modify existing code:
* MUST follow every **MUST** requirement in this spec.
* SHOULD follow every **SHOULD** requirement unless the user explicitly says otherwise.
* MUST prefer safe-by-default framework features and proven libraries over custom security code.
* MUST avoid introducing new risky sinks (runtime template compilation, `v-html` / `innerHTML`, unsafe URL navigation, dynamic script injection, etc.). ([Vue.js][1])
### 1.2 Passive review mode (always on while editing)
While working anywhere in a Vue repo (even if the user did not ask for a security scan):
* MUST “notice” violations of this spec in touched/nearby code.
* SHOULD mention issues as they come up, with a brief explanation + safe fix.
### 1.3 Active audit mode (explicit scan request)
When the user asks to “scan”, “audit”, or “hunt for vulns”:
* MUST systematically search the codebase for violations of this spec.
* MUST output findings in a structured format (see §2.3).
Recommended audit order:
1. Build/deploy entrypoints and hosting config (Docker, CI, static hosting, SSR server).
2. Secrets exposure (env usage, `.env*`, hard-coded keys). ([vitejs][2])
3. XSS surface: templates, `v-html` / `innerHTML`, URL/style injection, DOM APIs. ([Vue.js][1])
4. Auth/session handling in the browser (token storage, credentialed requests, CSRF integration). ([Vue.js][1])
5. Routing/navigation (open redirects, “return_to/next”, unsafe external navigation). ([Vue.js][1])
6. Third-party scripts and content (CDN assets, analytics, widgets, iframes). ([Vue.js][1])
7. Security headers and browser hardening expectations (CSP, clickjacking). ([Vue.js][1])
8. SSR-specific concerns (state serialization, template boundaries) when applicable. ([Vue.js][1])
---
## 2) Definitions and review guidance
### 2.1 Untrusted input (treat as attacker-controlled unless proven otherwise)
In a Vue app, untrusted input includes (non-exhaustive):
* Anything from APIs: `fetch`, `axios`, GraphQL responses, webhooks, third-party SDKs.
* Router-controlled data: `route.params`, `route.query`, `route.hash`, and anything derived from `window.location`.
* User-controlled persisted content: DB-backed content displayed in the UI (comments, profiles, CMS content).
* Browser-controlled storage: `localStorage`, `sessionStorage`, `IndexedDB`.
* Cross-window messages: `postMessage` inputs.
* Anything that can be influenced by an attacker through DOM clobbering or injected HTML (especially if Vue is mounted onto non-sterile DOM). ([Vue.js][1])
### 2.2 State-changing action (frontend perspective)
An action is state-changing if it can:
* Create/update/delete data via API calls.
* Change authentication/session state (login, logout, refresh token).
* Trigger privileged operations (payments, admin actions).
* Cause side effects (sending emails, triggering webhooks, changing account settings).
### 2.3 Required audit finding format
For each issue found, output:
* Rule ID:
* Severity: Critical / High / Medium / Low
* Location: file path + component/function + line(s)
* Evidence: the exact code/config snippet
* Impact: what could go wrong, who can exploit it
* Fix: safe change (prefer minimal diff)
* Mitigation: defense-in-depth if immediate fix is hard
* False positive notes: what to verify if uncertain
---
## 3) Secure baseline: minimum production configuration (MUST in production)
This is the smallest “production baseline” that prevents common Vue/front-end misconfigurations.
* MUST ship a **production build** (not a development build or dev server). ([Vue.js][3])
* MUST NOT ship secrets in frontend bundles; treat all client-exposed env variables as public. ([vitejs][2])
* MUST NOT render non-trusted templates or allow user-provided Vue templates (equivalent to arbitrary JS execution). ([Vue.js][1])
* SHOULD avoid raw HTML injection (`v-html`, `innerHTML`) unless content is trusted or strongly sandboxed. ([Vue.js][1])
* SHOULD deploy baseline security headers (especially CSP and clickjacking defenses) at the server/CDN layer. ([OWASP Cheat Sheet Series][4])
* SHOULD use safe auth patterns (prefer HttpOnly cookies for session tokens; coordinate with backend on CSRF). ([Vue.js][1])
---
## 4) Rules (generation + audit)
Each rule contains: required practice, insecure patterns, detection hints, and remediation.
### VUE-DEPLOY-001: Do not run dev/preview servers in production
Severity: High
Required:
* MUST NOT deploy the Vite/Vue dev server (`vite`, `npm run dev`, HMR) as the production server.
* MUST NOT use `vite preview` as a production server. ([vitejs][5])
* MUST build (`vite build`) and serve the built assets using a production-grade static server/CDN, or a production SSR server if you are doing SSR. ([vitejs][6])
Insecure patterns:
* Docker/Procfile/systemd running `vite`, `npm run dev`, or `vite preview` as the production entrypoint.
* Publicly exposed HMR endpoints.
Detection hints:
* Search: `vite`, `npm run dev`, `pnpm dev`, `yarn dev`, `vite preview`, `vue-cli-service serve`.
* Check Docker `CMD`, `ENTRYPOINT`, CI deploy scripts, platform config.
Fix:
* Build artifacts with `vite build`.
* Serve `dist/` with hardened hosting (CDN/static server) or integrate into your backend server as static assets.
Notes:
* Using dev/preview servers locally is fine; only flag if it is the production entrypoint.
---
### VUE-DEPLOY-002: Use Vue production builds and keep devtools off in production
Severity: Medium (High if production devtools/debug hooks are enabled)
Required:
* If loading Vue from CDN/self-host without a bundler, MUST use the `.prod.js` builds in production. ([Vue.js][3])
* SHOULD ensure production bundles do not enable Vue devtools in production builds, and SHOULD not intentionally enable production devtools flags. ([Vue.js][7])
Insecure patterns:
* Production includes development build artifacts.
* Explicitly enabling production devtools/diagnostic hooks.
Detection hints:
* Search HTML for `vue.global.js` / non-`.prod.js` variants when using CDN builds.
* Search build config for Vue feature flags like `__VUE_PROD_DEVTOOLS__`. ([Vue.js][7])
Fix:
* Switch to production build artifacts and ensure compile-time flags are configured for production.
---
### VUE-SECRETS-001: Never ship secrets in frontend code or env variables
Severity: High (Critical if real credentials are exposed)
Required:
* MUST treat all frontend code and configuration as public.
* MUST NOT embed secrets in:
* source code
* `.env` files committed to repo
* `import.meta.env.*` variables included in the bundle
* MUST assume any env var that ends up in the client bundle is attacker-readable. ([vitejs][2])
Insecure patterns:
* `VITE_API_KEY=...` containing a true secret (not just a public identifier).
* Hard-coded API keys, private tokens, service credentials, signing keys in JS/TS.
Detection hints:
* Search: `VITE_`, `import.meta.env`, `.env`, `.env.production`, `.env.*.local`.
* Grep for `API_KEY`, `SECRET`, `TOKEN`, `PRIVATE_KEY`, `BEGIN`, `sk-`, `AKIA`, etc.
Fix:
* Move secrets to backend/edge functions.
* Use backend-minted short-lived tokens for the browser when needed.
Notes:
* Vite specifically warns that `.env.*.local` should be gitignored and that `VITE_*` vars end up in the client bundle, so they must not contain sensitive info. ([vitejs][2])
---
### VUE-SECRETS-002: Do not broaden Vite env exposure
Severity: High
Required:
* MUST NOT configure Vite to expose all environment variables to the client.
* SHOULD keep `envPrefix` strict and explicit.
Insecure patterns:
* Setting `envPrefix` to overly broad values (or `''`) to “make env vars work”.
* Custom scripts that inject server secrets into global variables in HTML at build time.
Detection hints:
* Check `vite.config.*` for `envPrefix`.
* Look for `define: { 'process.env': ... }` or manual injection into `window.__CONFIG__`.
Fix:
* Keep secrets server-side.
* Only expose non-sensitive values intentionally designed to be public.
Notes:
* Vites docs explain that only prefixed variables are exposed and that exposed variables land in the client bundle. ([vitejs][2])
---
### VUE-XSS-001: Prefer Vues default escaping; avoid raw HTML injection
Severity: High
Required:
* MUST rely on Vues automatic escaping for text interpolation and attribute binding where possible. ([Vue.js][1])
* MUST NOT render user-provided HTML via:
* `v-html`
* `innerHTML` in render functions / JSX
* direct DOM APIs (`element.innerHTML`, `insertAdjacentHTML`)
unless the HTML is trusted or robustly sanitized and the risk is explicitly accepted. ([Vue.js][1])
Insecure patterns:
* `<div v-html="userProvidedHtml"></div>`
* `h('div', { innerHTML: userProvidedHtml })`
* `<div innerHTML={userProvidedHtml}></div>`
* `el.innerHTML = untrusted`
Detection hints:
* Search: `v-html`, `innerHTML`, `insertAdjacentHTML`, `DOMParser`, `document.write`.
Fix:
* Render untrusted content as text (interpolation).
* If HTML rendering is required (e.g., Markdown), sanitize with a well-maintained HTML sanitizer and apply defense-in-depth (CSP, Trusted Types). ([Vue.js][1])
Notes:
* Vues docs explicitly warn that user-provided HTML is never “100% safe” unless sandboxed or strictly self-only exposure. ([Vue.js][1])
---
### VUE-XSS-002: Never use non-trusted templates (client-side template/code injection)
Severity: Critical
Required:
* MUST NOT use non-trusted content as a Vue component template.
* MUST treat “user can write a Vue template” as “user can execute arbitrary JavaScript in your app”, and potentially in SSR contexts too. ([Vue.js][1])
* SHOULD prefer the runtime-only build (templates compiled at build time) and avoid shipping the runtime compiler unless you have a vetted need.
Insecure patterns:
* `createApp({ template: '<div>' + userProvidedString + '</div>' }).mount(...)`
* Storing templates in DB and compiling/rendering them in the browser.
* Admin/CMS features that allow entering Vue template syntax.
Detection hints:
* Search: `template:` where the value is not a static string.
* Search: `@vue/compiler-dom`, `compile(`, “runtime compiler” build selection, dynamic SFC compilation.
* Search for “template editor”, “custom template”, “theme HTML” features.
Fix:
* Treat templates as code: keep them developer-controlled.
* If end-user customization is required, use a safe format (restricted Markdown subset) rendered via a sanitizer, or isolate in a sandboxed iframe.
---
### VUE-XSS-003: Do not mount Vue onto DOM that may contain user-provided server-rendered HTML
Severity: Medium
Required:
* MUST NOT mount Vue on nodes that may contain server-rendered and user-provided content (because attacker-controlled HTML that is “safe as HTML” may become unsafe as a Vue template). ([Vue.js][1])
* SHOULD mount Vue into a “sterile” root element and render the apps DOM from Vue-controlled templates/components.
Insecure patterns:
* Server renders user content into `#app`, then Vue mounts on `#app` and compiles/interprets that DOM as a template.
* “Sprinkling Vue” on large server-rendered pages that include user-generated content.
Detection hints:
* Check server templates (e.g., Rails/Django/Express templates) for user HTML inserted inside the Vue mount root.
* Look for `mount('#app')` where `#app` includes server-rendered UGC.
Fix:
* Move user-rendered HTML outside the Vue mount root, or render it in a safe way (text/sanitized HTML) from Vue components.
---
### VUE-XSS-004: Prevent URL injection in bindings and navigations
Severity: High
Required:
* MUST validate/sanitize any user-influenced URL before binding to navigation sinks (`href`, `src`, `action`, `window.location`, `window.open`, router navigation to external).
* MUST specifically prevent `javascript:` URL execution in bindings like `<a :href="userProvidedUrl">`. ([Vue.js][1])
* SHOULD validate protocol and destination (allowlist `https:` and expected hosts; allow `mailto:`/`tel:` only if intended).
Insecure patterns:
* `<iframe :src="userProvidedUrl">`
* `window.location = route.query.next`
* `window.open(userProvidedUrl)`
Detection hints:
* Search: `:href=`, `:src=`, `window.location`, `location.href`, `window.open`, `router.push(` with untrusted input.
* Look for `next`, `return_to`, `redirect` query params.
Fix:
* Prefer internal navigation via route names/paths you control.
* For external URLs: parse with `new URL(...)`, allowlist protocol/host, reject `javascript:` and other dangerous schemes.
* Sanitize and validate on the backend before storing user URLs (Vue docs explicitly recommend backend sanitization). ([Vue.js][1])
---
### VUE-XSS-005: Prevent style/CSS injection and UI redress
Severity: Low
Required:
* MUST NOT bind attacker-controlled CSS strings broadly (e.g., `:style="userProvidedStyles"`).
* SHOULD use Vues style object syntax and only allow safe, specific properties if user customization is needed. ([Vue.js][1])
* SHOULD isolate “user can control layout/CSS” features inside sandboxed iframes.
Insecure patterns:
* `:style="userProvidedStyles"` where styles are attacker-controlled.
* Rendering user-provided `<style>` content (even if Vue blocks some patterns, dont try to work around it).
Detection hints:
* Search: `:style="` bound to non-constant variables that originate from API/user content.
* Search for “custom CSS”, “theme editor”, “profile CSS”.
Fix:
* Allowlist properties and values; avoid raw style strings.
* Use sandboxed iframes for rich user customization.
---
### VUE-XSS-006: Never bind user-provided JavaScript into event handler attributes
Severity: Critical
Required:
* MUST NOT bind attacker-provided strings into event handler attributes (e.g., `onclick`, `onfocus`, etc.).
* MUST treat “user-provided JS” as unsafe unless sandboxed and self-only exposure is guaranteed. ([Vue.js][1])
Insecure patterns:
* `<div :onclick="userProvidedString">`
* `<a :onmouseenter="userProvidedString">`
Detection hints:
* Search: `:on` followed by event attribute names (`:onclick`, `:onload`, etc.).
* Search for `setAttribute('on` patterns.
Fix:
* Use real event listeners with developer-controlled handlers.
* If you truly need user scripting, isolate it (sandboxed iframe + strict boundaries).
---
### VUE-ROUTER-001: Do not treat client-side route guards as authorization
Severity: High
Required:
* MUST NOT rely on Vue Router guards, UI hiding, or client-side checks to enforce authorization.
* MUST enforce authorization on the backend for every privileged action and sensitive data response. ([OWASP Cheat Sheet Series][8])
Insecure patterns:
* “Admin route is protected because `beforeEach` checks `user.isAdmin`.”
* Sensitive API endpoints that assume “the frontend wont call this unless allowed.”
Detection hints:
* Search `router.beforeEach` for role-based gating and see if the backend is also enforcing.
* Look for “security by route meta” patterns (`meta.requiresAdmin`) with no server corroboration.
Fix:
* Keep route guards as UX only (reduce accidental access), but enforce real checks server-side.
---
### VUE-ROUTER-002: Prevent open redirects and unsafe “return_to/next” handling
Severity: Low
Required:
* MUST validate redirect destinations derived from untrusted input (`next`, `return_to`, `redirect`).
* SHOULD allow only same-site relative paths or an explicit allowlist of destinations.
* MUST NOT allow non `http` / `https` protos (such as `javascript:`)
Insecure patterns:
* `router.push(route.query.next as string)`
* `window.location.href = route.query.redirect`
Detection hints:
* Search for `route.query.next`, `route.query.redirect`, `return_to`, `continue`, `callback`.
* Trace the value into router/window navigation sinks.
Fix:
* Allow only relative paths starting with `/` (and reject `//host`, `javascript:`, etc.).
* Prefer redirecting to named routes you control.
Notes:
* Even Vues docs note that sanitized URLs still may not guarantee safe destinations. ([Vue.js][1])
---
### VUE-AUTH-001: Token storage must assume XSS is possible
Severity: Low
Required:
* MUST assume any token accessible to JavaScript can be stolen via XSS.
* SHOULD prefer HttpOnly cookies (set by the backend) for session tokens, combined with CSRF protections where relevant. ([Vue.js][1])
* SHOULD avoid storing long-lived tokens (especially refresh tokens) in `localStorage`/`sessionStorage`.
Insecure patterns:
* `localStorage.setItem('token', ...)` for long-lived bearer tokens.
* Storing refresh tokens in JS-accessible storage.
Detection hints:
* Search: `localStorage`, `sessionStorage`, `indexedDB`, `persist`, `pinia-plugin-persistedstate`.
* Identify whether stored values are auth/session material.
Fix:
* Prefer backend-managed sessions via HttpOnly cookies.
* If bearer tokens are unavoidable, keep them short-lived, stored in memory, and rotate frequently; combine with strong XSS mitigations (CSP, Trusted Types, strict sanitization). ([OWASP Cheat Sheet Series][4])
---
### VUE-CSRF-001: Coordinate with the backend for CSRF when using cookies
Severity: High (for cookie-authenticated state-changing requests)
NOTE: If the application is not using cookie based authentication (for example if it passes an Authorization header), then CSRF is not a concern
Required:
* If API requests include cookies (`credentials: 'include'` / `withCredentials: true`) and cookies authenticate the user, MUST include CSRF protections coordinated with the backend (token/header patterns, Origin checks, SameSite cookies as defense-in-depth). ([Vue.js][1])
* MUST NOT “solve CORS/CSRF errors” by disabling protections on the backend or using `mode: 'no-cors'` on the frontend.
Insecure patterns:
* `fetch(url, { credentials: 'include', method: 'POST', body: ... })` with no CSRF token/header usage anywhere.
* Enabling cross-origin credentialed requests without strict origin allowlists (backend-side).
Detection hints:
* Search: `credentials: 'include'`, `withCredentials`, `xsrf`, `csrf`, `X-CSRF-Token`, `X-XSRF-TOKEN`.
* Look at API wrapper modules for headers and cookie settings.
Fix:
* Implement backend-issued CSRF tokens and require them on state-changing requests.
* Keep cookies `SameSite=Lax/Strict` where compatible and verify Origin/Referer where appropriate (backend-driven). ([OWASP Cheat Sheet Series][9])
Notes:
* Vues docs explicitly say CSRF is primarily backend-addressed but recommends coordinating on CSRF token submission. ([Vue.js][1])
---
### VUE-HTTP-001: Do not put secrets in URLs; avoid leaking sensitive data in navigation/logs
Severity: Medium
Required:
* MUST NOT place tokens/secrets in query strings or fragments (they leak via logs, referrers, browser history).
* SHOULD avoid logging sensitive values to console in production.
Insecure patterns:
* `/?token=...`, `/#access_token=...` used beyond short-lived OAuth handoff.
* `console.log(userSession)` that includes tokens/PII.
Detection hints:
* Search for `token=` in router parsing, auth callback handlers, and analytics logs.
* Search for `console.log(` around auth code.
Fix:
* Use Authorization headers or HttpOnly cookies.
* Scrub logs; gate debug logs behind dev-only checks.
---
### VUE-HEADERS-001: Require security headers at the deployment layer
Severity: Medium
Required:
* SHOULD deploy a CSP (`Content-Security-Policy`) suitable for your Vue app.
* SHOULD deploy clickjacking defenses (CSP `frame-ancestors` and/or `X-Frame-Options`) unless intentional embedding is required.
* SHOULD deploy `X-Content-Type-Options: nosniff`, plus other headers as appropriate (Referrer-Policy, Permissions-Policy). ([OWASP Cheat Sheet Series][4])
Insecure patterns:
* No evidence of headers in server/CDN config for an app with UGC or rich HTML rendering.
* CSP includes `unsafe-inline`/`unsafe-eval` without strong justification.
Detection hints:
* Look for hosting config: nginx, Netlify/Vercel headers config, CloudFront/Cloudflare rules.
* If absent in repo, flag as “verify at edge”.
Fix:
* Set headers at the edge or in the server. Start with a conservative CSP and tighten.
---
### VUE-CSP-001: Use Trusted Types and DOM XSS hardening when feasible
Severity: Low
Required:
* For apps with significant DOM injection surface (rich text, plugins, `v-html`), SHOULD consider enabling Trusted Types to reduce DOM XSS risk. ([web.dev][10])
* SHOULD treat Trusted Types as defense-in-depth, not a replacement for sanitization.
Insecure patterns:
* Frequent use of `innerHTML`/`v-html` without sanitization or CSP hardening.
Detection hints:
* Search: `v-html`, `innerHTML`, `insertAdjacentHTML`.
* Check CSP for `require-trusted-types-for 'script'` usage (if headers are in repo).
Fix:
* Reduce/centralize HTML injection, sanitize inputs, and add Trusted Types policies where appropriate.
---
### VUE-THIRDPARTY-001: Avoid dynamic third-party script injection; prefer static, vetted loading
Severity: Low
Required:
* MUST NOT inject `<script src="...">` where the URL is user-controlled.
* SHOULD treat third-party widgets/analytics as supply-chain risk; load only from vetted, pinned sources.
Insecure patterns:
* `const s=document.createElement('script'); s.src = userProvidedUrl; ...`
* “Plugin marketplace” that loads arbitrary remote scripts.
Detection hints:
* Search: `createElement('script')`, `.src =`, `appendChild(script)`.
* Search for “loadExternalScript”, “injectScript”, “cdnUrl”.
Fix:
* Bundle dependencies, or allowlist strict origins and enforce integrity (see SRI rule).
* Consider sandboxed iframes for untrusted third-party UI.
---
### VUE-SRI-001: Use Subresource Integrity for CDN-hosted scripts/styles
Severity: Low
Required:
* If loading scripts/styles from a CDN, SHOULD use Subresource Integrity (`integrity` attribute) with appropriate `crossorigin` configuration. ([MDN Web Docs][11])
* SHOULD prefer self-hosting or bundling over runtime CDN dependencies for security-critical code.
Insecure patterns:
* `<script src="https://cdn.example/...">` with no `integrity`.
* Remote script URLs that can change content without version pinning.
Detection hints:
* Search `index.html` and server templates for `https://` script/style tags.
* Check for `integrity=`.
Fix:
* Add SRI hashes (and pin versions), or bundle assets with your build.
---
### VUE-SUPPLY-001: Dependency and patch hygiene is mandatory
Severity: Low
Required:
* SHOULD keep Vue and official companion libraries updated; Vue explicitly recommends using latest versions to remain as secure as possible. ([Vue.js][1])
* MUST respond to security advisories promptly.
* SHOULD pin dependencies and keep lockfiles committed (to reduce drift in production artifacts).
Insecure patterns:
* Outdated major versions with known CVEs.
* No lockfile in repo; wide semver ranges for critical deps.
* Ignoring advisories for template/rendering/compiler packages.
Detection hints:
* Inspect `package.json`, lockfiles, CI install commands.
* Search for `npm audit` disabled, “ignore vulnerabilities” scripts.
Fix:
* Upgrade dependencies and add regression tests around the impacted behavior.
* Add dependency scanning in CI.
---
### VUE-SSR-001: SSR adds additional trust boundaries; treat state injection as XSS-sensitive
Severity: Medium
Required:
* When using SSR, MUST treat anything injected into the HTML document (initial state, serialized data, inline scripts) as XSS-sensitive.
* MUST keep the “trusted templates only” rule even stricter, because unsafe templates can lead to server-side execution during rendering. ([Vue.js][1])
* SHOULD follow Vue SSR documentation and best practices for SSR security. ([Vue.js][1])
Insecure patterns:
* Concatenating untrusted strings into SSR templates.
* Injecting JSON into `<script>` blocks without robust escaping/serialization controls.
Detection hints:
* Search server code for `__INITIAL_STATE__`, `window.__*STATE__`, template concatenation, and SSR render pipelines.
* Trace untrusted data into those sinks.
Fix:
* Use safe serialization patterns recommended by your SSR stack.
* Avoid rendering untrusted HTML; sanitize or isolate.
---
## 5) Practical scanning heuristics (how to “hunt”)
When actively scanning, use these high-signal patterns:
* Dev/preview servers in production:
* `npm run dev`, `vite`, `vite preview`, `vue-cli-service serve` ([vitejs][5])
* Secrets exposure:
* `.env`, `.env.production`, `.env.*.local`, `VITE_`, `import.meta.env`, hard-coded `API_KEY` / `SECRET` ([vitejs][2])
* XSS sinks:
* `v-html`, `innerHTML`, `insertAdjacentHTML`, `DOMParser`, `document.write` ([Vue.js][1])
* Client-side template injection:
* `template:` concatenation, `compile(`, runtime compiler usage, mounting on non-sterile DOM ([Vue.js][1])
* URL injection / open redirects:
* `:href="..."` / `:src="..."` from user data
* `javascript:` occurrences
* `route.query.next` / `redirect` / `return_to` flowing into `router.push` or `window.location` ([Vue.js][1])
* Style injection:
* `:style="userProvidedStyles"` or user-driven theme CSS ([Vue.js][1])
* Token storage:
* `localStorage.setItem('token'...)`, persisted auth stores, refresh tokens in JS-accessible storage
* CSRF integration red flags:
* `credentials: 'include'` / `withCredentials: true` without any CSRF header/token handling ([Vue.js][1])
* Third-party scripts:
* dynamic script injection (`createElement('script')`), CDN scripts without SRI ([MDN Web Docs][11])
* External links security:
* `target="_blank"` without `rel="noopener"`/`noreferrer` (still recommended for legacy and explicitness) ([MDN Web Docs][12])
Always try to confirm:
* data origin (untrusted vs trusted)
* sink type (HTML/DOM insertion, template compilation, URL navigation, style injection, script injection)
* protective controls present (sanitization, allowlists, CSP/Trusted Types, backend validation)
---
## 6) Sources (accessed 2026-01-27)
Primary Vue documentation:
* Vue Docs: Security — `https://vuejs.org/guide/best-practices/security` ([Vue.js][1])
* Vue Docs: Template Syntax (security warning about in-DOM templates) — `https://vuejs.org/guide/essentials/template-syntax` ([Vue.js][13])
* Vue Docs: Production Deployment — `https://vuejs.org/guide/best-practices/production-deployment` ([Vue.js][3])
* Vue Docs: Feature Flags — `https://link.vuejs.org/feature-flags` ([Vue.js][7])
Vite documentation (common Vue tooling):
* Vite Docs: Env Variables and Modes (VITE_* exposure + security notes) — `https://vite.dev/guide/env-and-mode` ([vitejs][2])
* Vite Docs: CLI (`vite preview` not designed for production) — `https://vite.dev/guide/cli` ([vitejs][5])
* Vite Docs: Server Options (`server.host` can listen on public addresses) — `https://vite.dev/config/server-options` ([vitejs][14])
OWASP and web platform hardening references:
* OWASP Cheat Sheet Series: XSS Prevention — `https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html` ([Vue.js][1])
* OWASP Cheat Sheet Series: CSRF Prevention — `https://cheatsheetseries.owasp.org/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.html` ([OWASP Cheat Sheet Series][9])
* OWASP Cheat Sheet Series: Authorization — `https://cheatsheetseries.owasp.org/cheatsheets/Authorization_Cheat_Sheet.html` ([OWASP Cheat Sheet Series][8])
* OWASP Cheat Sheet Series: HTTP Headers — `https://cheatsheetseries.owasp.org/cheatsheets/HTTP_Headers_Cheat_Sheet.html` ([OWASP Cheat Sheet Series][4])
* HTML5 Security Cheat Sheet (referenced by Vue) — `https://html5sec.org/` ([Vue.js][1])
Browser/platform references:
* MDN: `rel="noopener"``https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Attributes/rel/noopener` ([MDN Web Docs][12])
* MDN: Subresource Integrity — `https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity` ([MDN Web Docs][11])
* web.dev: Trusted Types — `https://web.dev/trusted-types/` ([web.dev][10])
[1]: https://vuejs.org/guide/best-practices/security "https://vuejs.org/guide/best-practices/security"
[2]: https://vite.dev/guide/env-and-mode "https://vite.dev/guide/env-and-mode"
[3]: https://vuejs.org/guide/best-practices/production-deployment "https://vuejs.org/guide/best-practices/production-deployment"
[4]: https://cheatsheetseries.owasp.org/cheatsheets/HTTP_Headers_Cheat_Sheet.html "https://cheatsheetseries.owasp.org/cheatsheets/HTTP_Headers_Cheat_Sheet.html"
[5]: https://vite.dev/guide/cli "https://vite.dev/guide/cli"
[6]: https://vite.dev/guide/build "https://vite.dev/guide/build"
[7]: https://vuejs.org/guide/best-practices/production-deployment?utm_source=chatgpt.com "Production Deployment"
[8]: https://cheatsheetseries.owasp.org/cheatsheets/Authorization_Cheat_Sheet.html "https://cheatsheetseries.owasp.org/cheatsheets/Authorization_Cheat_Sheet.html"
[9]: https://cheatsheetseries.owasp.org/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.html "https://cheatsheetseries.owasp.org/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.html"
[10]: https://web.dev/articles/trusted-types "https://web.dev/articles/trusted-types"
[11]: https://developer.mozilla.org/en-US/docs/Web/Security/Defenses/Subresource_Integrity?utm_source=chatgpt.com "Subresource Integrity - Security - MDN Web Docs"
[12]: https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Attributes/rel/noopener "https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Attributes/rel/noopener"
[13]: https://vuejs.org/guide/essentials/template-syntax "Template Syntax | Vue.js"
[14]: https://vite.dev/config/server-options "https://vite.dev/config/server-options"

Some files were not shown because too many files have changed in this diff Show More