fix: enhance download-docs skill to handle working-directory and update metadata file extension

This commit is contained in:
Hans Aschauer 2026-05-18 07:33:19 +02:00
parent 8288787b4e
commit 02931b70d5

View file

@ -67,8 +67,11 @@ Fetch each via raw URL:
https://raw.githubusercontent.com/{owner}/{repo}/{branch}/{ci_file}
```
Scan content for keywords like `ref:`, `branch:`, `gh-pages`, `checkout`.
Scan content for keywords like `ref:`, `branch:`, `gh-pages`, `checkout`,
`working-directory`.
If a specific docs branch is found, update `BRANCH` and re-run Step 2.
If a `working-directory:` line is found (e.g. `working-directory: ./www`),
extract that path and prepend it to `DOC_LOCATIONS` so it is tried first.
### Step 4 — Recursive download
@ -104,7 +107,7 @@ For each downloaded file:
1. Reconstruct the relative path under `{ARTIFACT_DIR}/{repo}/{file_path}`.
2. Create parent directories with `Path.mkdir(parents=True, exist_ok=True)`.
3. Write file content (UTF-8, errors=`replace`).
4. Write `.meta.json` sidecar at `{out_path}.meta.json`.
4. Write `.json` sidecar at `{out_path.with_suffix('.json')}`.
**Metadata fields**:
```json
@ -257,7 +260,7 @@ def process_dir(api_path):
"content_type": r.get("content_type", "text/plain"),
"downloaded_at": now_iso,
}
meta_path = out_path.parent / (out_path.name + ".meta.json")
meta_path = out_path.with_suffix(".json")
meta_path.write_text(json.dumps(meta, indent=2), encoding="utf-8")
downloaded += 1