fix: enhance download-docs skill to handle working-directory and update metadata file extension

This commit is contained in:
Hans Aschauer 2026-05-18 07:33:19 +02:00
parent 8288787b4e
commit 02931b70d5

View file

@ -67,8 +67,11 @@ Fetch each via raw URL:
https://raw.githubusercontent.com/{owner}/{repo}/{branch}/{ci_file} https://raw.githubusercontent.com/{owner}/{repo}/{branch}/{ci_file}
``` ```
Scan content for keywords like `ref:`, `branch:`, `gh-pages`, `checkout`. Scan content for keywords like `ref:`, `branch:`, `gh-pages`, `checkout`,
`working-directory`.
If a specific docs branch is found, update `BRANCH` and re-run Step 2. If a specific docs branch is found, update `BRANCH` and re-run Step 2.
If a `working-directory:` line is found (e.g. `working-directory: ./www`),
extract that path and prepend it to `DOC_LOCATIONS` so it is tried first.
### Step 4 — Recursive download ### Step 4 — Recursive download
@ -104,7 +107,7 @@ For each downloaded file:
1. Reconstruct the relative path under `{ARTIFACT_DIR}/{repo}/{file_path}`. 1. Reconstruct the relative path under `{ARTIFACT_DIR}/{repo}/{file_path}`.
2. Create parent directories with `Path.mkdir(parents=True, exist_ok=True)`. 2. Create parent directories with `Path.mkdir(parents=True, exist_ok=True)`.
3. Write file content (UTF-8, errors=`replace`). 3. Write file content (UTF-8, errors=`replace`).
4. Write `.meta.json` sidecar at `{out_path}.meta.json`. 4. Write `.json` sidecar at `{out_path.with_suffix('.json')}`.
**Metadata fields**: **Metadata fields**:
```json ```json
@ -257,7 +260,7 @@ def process_dir(api_path):
"content_type": r.get("content_type", "text/plain"), "content_type": r.get("content_type", "text/plain"),
"downloaded_at": now_iso, "downloaded_at": now_iso,
} }
meta_path = out_path.parent / (out_path.name + ".meta.json") meta_path = out_path.with_suffix(".json")
meta_path.write_text(json.dumps(meta, indent=2), encoding="utf-8") meta_path.write_text(json.dumps(meta, indent=2), encoding="utf-8")
downloaded += 1 downloaded += 1