API Reference¶
This section documents all modules and functions in the sw-metadata-bot package.
Analysis Runtime Module¶
Low-level analysis workflow helpers for pipeline orchestration.
- class sw_metadata_bot.analysis_runtime.CurrentAnalysisContext(repo_url, pitfall_file, data, pitfalls_count, warnings_count, pitfalls_ids, warnings_ids, analysis_date, rsmetacheck_version, findings_signature, has_findings, codemeta_status, codemeta_missing, codemeta_generated, generated_codemeta)[source]¶
Bases:
objectParsed current-analysis state needed to build a decision record.
- Parameters:
- __init__(repo_url, pitfall_file, data, pitfalls_count, warnings_count, pitfalls_ids, warnings_ids, analysis_date, rsmetacheck_version, findings_signature, has_findings, codemeta_status, codemeta_missing, codemeta_generated, generated_codemeta)¶
- class sw_metadata_bot.analysis_runtime.PreviousAnalysisContext(previous_exists, previous_issue_url, previous_issue_state, previous_commit_id, previous_signature, previous_issue_open, previous_codemeta_missing, repo_updated)[source]¶
Bases:
objectPrevious-analysis state needed for incremental decision making.
- Parameters:
- __init__(previous_exists, previous_issue_url, previous_issue_state, previous_commit_id, previous_signature, previous_issue_open, previous_codemeta_missing, repo_updated)¶
- sw_metadata_bot.analysis_runtime.extract_previous_commit(record)[source]¶
Return previous commit id from report records with compatibility fallback.
- sw_metadata_bot.analysis_runtime.resolve_per_repo_paths(analysis_root, repo_url)[source]¶
Compute per-repository output paths within the analysis root.
- sw_metadata_bot.analysis_runtime.copy_previous_repo_artifacts(previous_repo_folder, current_repo_folder)[source]¶
Copy previous snapshot repository artifacts into current snapshot folder.
- sw_metadata_bot.analysis_runtime.load_previous_repo_record(previous_snapshot_root, repo_url)[source]¶
Load previous per-repo record from previous snapshot if available.
- sw_metadata_bot.analysis_runtime.standardize_metacheck_outputs(repo_folder)[source]¶
Normalize metacheck output names to stable per-repo filenames.
RSMetacheck outputs multiple artifacts with varying names depending on tool version and configuration. This function consolidates them into a standard naming scheme for consistent downstream processing.
Normalization Strategy (for research software clarity):
- Pitfalls (JSON-LD): Often named with repository name or timestamp.
Standardized to
pitfall.jsonld.
- SOMEF output: Can be nested in subdirectories or root.
Standardized to
somef_output.json.
- Generated codemeta: Created by rsmetacheck if metadata is missing.
Standardized to
codemeta_generated.json.
File Discovery Uses Fallback Strategy: 1. Try explicit subdirectory (metacheck’s preferred location) 2. Fall back to glob patterns if subdirectory empty 3. Apply heuristics (payload inspection) to disambiguate similar files
This defensive approach handles different metacheck versions gracefully without failing when directory structure differs from expectations.
- Parameters:
repo_folder (Path)
- Return type:
None
- sw_metadata_bot.analysis_runtime.run_metacheck_for_repo(repo_url, repo_folder, *, generate_codemeta_if_missing)[source]¶
Run metacheck for a single repository URL into its own folder.
- sw_metadata_bot.analysis_runtime.build_analysis_counters(records)[source]¶
Build analysis counters using the unified report schema.
- sw_metadata_bot.analysis_runtime.build_analysis_run_report(records, *, dry_run, run_root, analysis_summary_file, previous_report)[source]¶
Build run-level report payload from analysis decision records.
- sw_metadata_bot.analysis_runtime.detect_repo_platform(repo_url)[source]¶
Detect publish platform from a repository URL.
- sw_metadata_bot.analysis_runtime.is_previous_issue_open(previous_record)[source]¶
Infer whether previous issue was open from stored metadata only.
- sw_metadata_bot.analysis_runtime.build_record_entry(*, run_root, repo_url, platform, pitfalls_count, warnings_count, analysis_date, rsmetacheck_version, pitfalls_ids, warnings_ids, action, reason_code, findings_signature, current_commit_id, previous_commit_id, previous_issue_url, previous_issue_state, dry_run, issue_persistence, issue_url, file_path, codemeta_generated=None, codemeta_status=None, error=None)[source]¶
Build a per-repository analysis record payload.
- Parameters:
run_root (Path)
repo_url (str)
platform (str | None)
pitfalls_count (int)
warnings_count (int)
analysis_date (str)
rsmetacheck_version (str)
action (str)
reason_code (str)
findings_signature (str)
current_commit_id (str | None)
previous_commit_id (str | None)
previous_issue_url (str | None)
previous_issue_state (str | None)
dry_run (bool)
issue_persistence (str)
issue_url (str | None)
file_path (Path)
codemeta_generated (bool | None)
codemeta_status (str | None)
error (str | None)
- Return type:
- sw_metadata_bot.analysis_runtime.write_analysis_repo_report(repo_folder, record, *, dry_run, run_root, analysis_summary_file, previous_report)[source]¶
Write per-repository analysis report using analysis-stage counters.
Check Parsing Module¶
Shared parsing helpers for RSMetacheck check identifiers.
RSMetacheck evaluates each repository against a catalog of checks for metadata quality. Checks are identified by a code: P#### for Pitfalls (high-priority issues that indicate missing or invalid metadata) or W#### for Warnings (informational checks or best-practice recommendations). The #### segment is a 3-4 digit code within each category.
Example check codes: - P001: Repository lacks codemeta.json file - W001: Incomplete metadata field descriptions - P042: Missing license information
See constants.py for related definitions (CHECK_TYPE_*, CHECK_CODE_REGEX_PATTERN).
- sw_metadata_bot.check_parsing.get_check_catalog_id(check)[source]¶
Return full RSMetacheck catalog ID URL for a check when available.
Preferred source is the new schema key
assessesIndicator.@idwhen it points to the RSMetacheck catalog. For backward compatibility, this falls back to the legacypitfallkey.
- sw_metadata_bot.check_parsing.get_short_check_code(check)[source]¶
Return short check code such as P001 or W004.
Commit Lookup Module¶
Repository head commit lookup utilities.
- sw_metadata_bot.commit_lookup.parse_github_repo(repo_url)[source]¶
Parse owner/repo from a GitHub repository URL.
- sw_metadata_bot.commit_lookup.resolve_gitlab_project_path(repo_url)[source]¶
Parse host and project path for GitLab repositories.
- sw_metadata_bot.commit_lookup.is_commit_hash(value)[source]¶
Return True if value looks like a commit hash.
- sw_metadata_bot.commit_lookup.get_github_head_commit(repo_url, token=None)[source]¶
Fetch current head commit from GitHub API.
- sw_metadata_bot.commit_lookup.get_gitlab_head_commit(repo_url, token=None)[source]¶
Fetch current head commit from GitLab API for gitlab* hosts.
Config Utils Module¶
Helpers for the unified configuration file.
- sw_metadata_bot.config_utils.normalize_repo_url(url)[source]¶
Normalize repository URLs for matching and persistence.
- sw_metadata_bot.config_utils.detect_platform(url)[source]¶
Detect publishing platform from repository URL.
Returns
"github"for GitHub URLs,"gitlab"for any GitLab URL, orNonewhen the URL does not match a known platform.
- sw_metadata_bot.config_utils.load_config(config_path)[source]¶
Load and validate a unified configuration file.
- sw_metadata_bot.config_utils.get_repositories(config)[source]¶
Return normalized repositories preserving order and uniqueness.
- sw_metadata_bot.config_utils.get_custom_message(config)[source]¶
Return the configured issue custom message if present.
- sw_metadata_bot.config_utils.get_opt_out_repositories(config)[source]¶
Return normalized repository URLs configured as inline opt-outs.
- sw_metadata_bot.config_utils.get_generate_codemeta_if_missing(config)[source]¶
Return whether codemeta suggestions should be generated when missing.
- sw_metadata_bot.config_utils.append_opt_out_repository(config_path, repo_url)[source]¶
Persist a repository to the inline opt-outs list when not already present.
- sw_metadata_bot.config_utils.resolve_output_root(config, config_path)[source]¶
Return the configured output root, resolving relative paths from project root.
- sw_metadata_bot.config_utils.resolve_run_name(config, config_path)[source]¶
Return the configured run name or a sensible default.
- sw_metadata_bot.config_utils.resolve_snapshot_tag(config, explicit_snapshot_tag)[source]¶
Resolve the snapshot tag from CLI override or config defaults.
- sw_metadata_bot.config_utils.sanitize_repo_name(repo_url)[source]¶
Sanitize repository URL to a safe folder name format.
Uses a generic URL-safe transformation so non-standard URLs still map to deterministic folder names.
RSMetaCheck Wrapper Module¶
Wrapper for rsmetacheck CLI to integrate with sw-metadata-bot.
GitHub API Module¶
GitHub API client.
- class sw_metadata_bot.github_api.GitHubAPI(token=None, dry_run=False)[source]¶
Bases:
IssueAPIBaseSimple GitHub API client.
- verify_auth()[source]¶
Verify authentication and return detailed information.
- Returns:
Dictionary with authentication details including user, scopes, and permissions.
- Return type:
GitLab API Module¶
GitLab API client.
- class sw_metadata_bot.gitlab_api.GitLabAPI(token=None, dry_run=False)[source]¶
Bases:
IssueAPIBaseSimple GitLab API client.
History Module¶
Helpers for loading and querying previous issue reports.
- sw_metadata_bot.history.load_previous_report(report_path)[source]¶
Load report.json and index issue-lifecycle entries by repository URL.
Incremental Module¶
Decision engine for incremental issue lifecycle handling.
- class sw_metadata_bot.incremental.Decision(action, reason)[source]¶
Bases:
objectDecision outcome for a repository in incremental mode.
- sw_metadata_bot.incremental.evaluate(*, previous_exists, unsubscribed, repo_updated, has_findings, identical_findings, previous_issue_open, codemeta_missing, previous_codemeta_missing)[source]¶
Evaluate the configured decision tree and return action + reason.
This function implements a cascading decision tree that determines whether to create a new issue, update an existing one with a comment, close it, or stop (skip). The logic prioritizes certain conditions to prevent unnecessary noise.
Decision Tree (evaluated in order):
1. NO PREVIOUS ANALYSIS action="create" (first-time analysis, always create issue) 2. UNSUBSCRIBE DETECTED action="stop" (user explicitly unsubscribed, respect their choice) 3. REPOSITORY NOT UPDATED action="stop" (no changes since last analysis, skip) 4. MISSING CODEMETA WITHOUT OTHER FINDINGS Check if codemeta status changed: - If issue open AND codemeta status unchanged: action="stop" (already reported, issue still relevant) - If issue open AND codemeta status changed: action="comment" (report that codemeta was added/removed) - If no issue open: action="create" (new codemeta issue) 5. NO FINDINGS (REPO IS CLEAN) - If issue is open: action="close" (metadata quality improved, close issue) - If no issue: action="stop" (nothing to report) 6. FINDINGS IDENTICAL TO PREVIOUS Check issue state: - If issue open: action="stop" (same issue already posted) - If issue closed: action="create" (quality got worse again after improvements) 7. FINDINGS CHANGED (DEFAULT CASE) Check issue state: - If issue open: action="comment" (update existing issue with new findings) - If no issue: return "create" (quality changed while issue is closed)
- Parameters:
previous_exists (bool) – Whether a previous analysis snapshot exists
unsubscribed (bool) – Whether unsubscribe comment detected on existing issue
repo_updated (bool) – Whether repository has new commits since last analysis
has_findings (bool) – Whether current analysis found metadata issues
identical_findings (bool) – Whether findings are identical to previous run
previous_issue_open (bool) – Whether previously opened issue is still open
codemeta_missing (bool) – Whether codemeta.json is missing in current analysis
previous_codemeta_missing (bool) – Whether codemeta.json was missing in previous analysis
- Returns:
Decision object with action and reason explaining the choice
- Return type:
Note
For research software: This decision tree is intentionally conservative, favoring skipping unnecessary issues over creating duplicate noise. Changes to this logic should be discussed as they affect user experience.
Main Module¶
CLI entry point for sw-metadata-bot.
Pipeline Module¶
Pipeline command to run analysis workflows.
- sw_metadata_bot.pipeline.find_latest_previous_report(output_root, run_name, current_snapshot_tag)[source]¶
Find latest previous report path from same run folder.
- sw_metadata_bot.pipeline.run_pipeline(config_file, dry_run, snapshot_tag, previous_report, force_analysis=False)[source]¶
Run analysis and write issue decision records without API side effects.
When force_analysis is True, the pipeline will bypass artifact reuse for unchanged repositories and treat them as if the repository was updated.
Pitfalls Module¶
Pitfalls data loading and parsing.
- sw_metadata_bot.pitfalls.get_repository_url(data)[source]¶
Extract repository URL from pitfalls data.
- sw_metadata_bot.pitfalls.get_rsmetacheck_version(data)[source]¶
Get the version of RSMetacheck used for analysis. Version is in checkingSoftware.softwareVersion Falls back to “unknown” if not found.
Publish Module¶
Publish issues from an existing analysis snapshot.
Token Resolver Module¶
Token resolution helpers for API clients.
Verify Tokens Module¶
Token verification command - check if tokens have correct permissions.