API Reference¶

This section documents all modules and functions in the sw-metadata-bot package.

Analysis Runtime Module¶

Low-level analysis workflow helpers for pipeline orchestration.

class sw_metadata_bot.analysis_runtime.CurrentAnalysisContext(repo_url, pitfall_file, data, pitfalls_count, warnings_count, pitfalls_ids, warnings_ids, analysis_date, rsmetacheck_version, findings_signature, has_findings, codemeta_status, codemeta_missing, codemeta_generated, generated_codemeta)[source]¶

Bases: object

Parsed current-analysis state needed to build a decision record.

Parameters:

repo_url (str)
pitfall_file (Path)
data (dict[str, Any])
pitfalls_count (int)
warnings_count (int)
pitfalls_ids (list[str])
warnings_ids (list[str])
analysis_date (str)
rsmetacheck_version (str)
findings_signature (str)
has_findings (bool)
codemeta_status (str)
codemeta_missing (bool)
codemeta_generated (bool)
generated_codemeta (dict[str, Any] | None)

repo_url: str¶

pitfall_file: Path¶

data: dict[str, Any]¶

pitfalls_count: int¶

warnings_count: int¶

pitfalls_ids: list[str]¶

warnings_ids: list[str]¶

analysis_date: str¶

rsmetacheck_version: str¶

findings_signature: str¶

has_findings: bool¶

codemeta_status: str¶

codemeta_missing: bool¶

codemeta_generated: bool¶

generated_codemeta: dict[str, Any] | None¶

__init__(repo_url, pitfall_file, data, pitfalls_count, warnings_count, pitfalls_ids, warnings_ids, analysis_date, rsmetacheck_version, findings_signature, has_findings, codemeta_status, codemeta_missing, codemeta_generated, generated_codemeta)¶

Parameters:

repo_url (str)
pitfall_file (Path)
data (dict[str, Any])
pitfalls_count (int)
warnings_count (int)
pitfalls_ids (list[str])
warnings_ids (list[str])
analysis_date (str)
rsmetacheck_version (str)
findings_signature (str)
has_findings (bool)
codemeta_status (str)
codemeta_missing (bool)
codemeta_generated (bool)
generated_codemeta (dict[str, Any] | None)

Return type:

None

class sw_metadata_bot.analysis_runtime.PreviousAnalysisContext(previous_exists, previous_issue_url, previous_issue_state, previous_commit_id, previous_signature, previous_issue_open, previous_codemeta_missing, repo_updated)[source]¶

Bases: object

Previous-analysis state needed for incremental decision making.

Parameters:

previous_exists (bool)
previous_issue_url (str | None)
previous_issue_state (str | None)
previous_commit_id (str | None)
previous_signature (str)
previous_issue_open (bool)
previous_codemeta_missing (bool)
repo_updated (bool)

previous_exists: bool¶

previous_issue_url: str | None¶

previous_issue_state: str | None¶

previous_commit_id: str | None¶

previous_signature: str¶

previous_issue_open: bool¶

previous_codemeta_missing: bool¶

repo_updated: bool¶

__init__(previous_exists, previous_issue_url, previous_issue_state, previous_commit_id, previous_signature, previous_issue_open, previous_codemeta_missing, repo_updated)¶

Parameters:

previous_exists (bool)
previous_issue_url (str | None)
previous_issue_state (str | None)
previous_commit_id (str | None)
previous_signature (str)
previous_issue_open (bool)
previous_codemeta_missing (bool)
repo_updated (bool)

Return type:

None

sw_metadata_bot.analysis_runtime.extract_previous_commit(record)[source]¶

Return previous commit id from report records with compatibility fallback.

Parameters:: record (dict)
Return type:: str | None

sw_metadata_bot.analysis_runtime.resolve_per_repo_paths(analysis_root, repo_url)[source]¶

Compute per-repository output paths within the analysis root.

Parameters:

analysis_root (Path)
repo_url (str)

Return type:

dict[str, Path]

sw_metadata_bot.analysis_runtime.copy_previous_repo_artifacts(previous_repo_folder, current_repo_folder)[source]¶

Copy previous snapshot repository artifacts into current snapshot folder.

Parameters:

previous_repo_folder (Path)
current_repo_folder (Path)

Return type:

None

sw_metadata_bot.analysis_runtime.load_previous_repo_record(previous_snapshot_root, repo_url)[source]¶

Load previous per-repo record from previous snapshot if available.

Parameters:

previous_snapshot_root (Path | None)
repo_url (str)

Return type:

dict | None

sw_metadata_bot.analysis_runtime.standardize_metacheck_outputs(repo_folder)[source]¶

Normalize metacheck output names to stable per-repo filenames.

RSMetacheck outputs multiple artifacts with varying names depending on tool version and configuration. This function consolidates them into a standard naming scheme for consistent downstream processing.

Normalization Strategy (for research software clarity):

Pitfalls (JSON-LD): Often named with repository name or timestamp.
Standardized to pitfall.jsonld.

SOMEF output: Can be nested in subdirectories or root.
Standardized to somef_output.json.

Generated codemeta: Created by rsmetacheck if metadata is missing.
Standardized to codemeta_generated.json.

File Discovery Uses Fallback Strategy: 1. Try explicit subdirectory (metacheck’s preferred location) 2. Fall back to glob patterns if subdirectory empty 3. Apply heuristics (payload inspection) to disambiguate similar files

This defensive approach handles different metacheck versions gracefully without failing when directory structure differs from expectations.

Parameters:: repo_folder (Path)
Return type:: None

sw_metadata_bot.analysis_runtime.run_metacheck_for_repo(repo_url, repo_folder, *, generate_codemeta_if_missing)[source]¶

Run metacheck for a single repository URL into its own folder.

Parameters:

repo_url (str)
repo_folder (Path)
generate_codemeta_if_missing (bool)

Return type:

None

sw_metadata_bot.analysis_runtime.build_analysis_counters(records)[source]¶

Build analysis counters using the unified report schema.

Parameters:: records (list[dict[str, object]])
Return type:: dict[str, int]

sw_metadata_bot.analysis_runtime.build_analysis_run_report(records, *, dry_run, run_root, analysis_summary_file, previous_report, input_config_file=None)[source]¶

Build run-level report payload from analysis decision records.

Parameters:

records (list[dict[str, object]])
dry_run (bool)
run_root (Path)
analysis_summary_file (Path)
previous_report (Path | None)
input_config_file (Path | None)

Return type:

dict[str, object]

sw_metadata_bot.analysis_runtime.detect_repo_platform(repo_url)[source]¶

Detect publish platform from a repository URL.

Parameters:: repo_url (str)
Return type:: str | None

sw_metadata_bot.analysis_runtime.is_previous_issue_open(previous_record)[source]¶

Infer whether previous issue was open from stored metadata only.

Parameters:: previous_record (dict[str, object])
Return type:: bool

sw_metadata_bot.analysis_runtime.build_record_entry(*, run_root, repo_url, platform, pitfalls_count, warnings_count, analysis_date, rsmetacheck_version, pitfalls_ids, warnings_ids, action, reason_code, findings_signature, current_commit_id, previous_commit_id, previous_issue_url, previous_issue_state, dry_run, issue_persistence, issue_url, file_path, codemeta_generated=None, codemeta_status=None, error=None)[source]¶

Build a per-repository analysis record payload.

Parameters:

run_root (Path)
repo_url (str)
platform (str | None)
pitfalls_count (int)
warnings_count (int)
analysis_date (str)
rsmetacheck_version (str)
pitfalls_ids (list[str])
warnings_ids (list[str])
action (str)
reason_code (str)
findings_signature (str)
current_commit_id (str | None)
previous_commit_id (str | None)
previous_issue_url (str | None)
previous_issue_state (str | None)
dry_run (bool)
issue_persistence (str)
issue_url (str | None)
file_path (Path)
codemeta_generated (bool | None)
codemeta_status (str | None)
error (str | None)

Return type:

dict[str, object]

sw_metadata_bot.analysis_runtime.write_analysis_repo_report(repo_folder, record, *, dry_run, run_root, analysis_summary_file, previous_report)[source]¶

Write per-repository analysis report using analysis-stage counters.

Parameters:

repo_folder (Path)
record (dict[str, object])
dry_run (bool)
run_root (Path)
analysis_summary_file (Path)
previous_report (Path | None)

Return type:

None

sw_metadata_bot.analysis_runtime.create_analysis_record(*, run_root, repo_url, repo_folder, previous_record, current_commit_id, dry_run, custom_message, force_analysis=False)[source]¶

Create a decision record for a repository without platform API calls.

Parameters:

run_root (Path)
repo_url (str)
repo_folder (Path)
previous_record (dict[str, object] | None)
current_commit_id (str | None)
dry_run (bool)
custom_message (str | None)
force_analysis (bool)

Return type:

dict[str, object]

Check Parsing Module¶

Shared parsing helpers for RSMetacheck check identifiers.

RSMetacheck evaluates each repository against a catalog of checks for metadata quality. Checks are identified by a code: P#### for Pitfalls (high-priority issues that indicate missing or invalid metadata) or W#### for Warnings (informational checks or best-practice recommendations). The #### segment is a 3-4 digit code within each category.

Example check codes: - P001: Repository lacks codemeta.json file - W001: Incomplete metadata field descriptions - P042: Missing license information

See constants.py for related definitions (CHECK_TYPE_*, CHECK_CODE_REGEX_PATTERN).

sw_metadata_bot.check_parsing.get_check_catalog_id(check)[source]¶

Return full RSMetacheck catalog ID URL for a check when available.

Preferred source is the new schema key assessesIndicator.@id when it points to the RSMetacheck catalog. For backward compatibility, this falls back to the legacy pitfall key.

Parameters:: check (dict)
Return type:: str

sw_metadata_bot.check_parsing.get_short_check_code(check)[source]¶

Return short check code such as P001 or W004.

Parameters:: check (dict)
Return type:: str

sw_metadata_bot.check_parsing.is_check_reported(check)[source]¶

Return True only when a check is explicitly reported by metacheck.

Verbose metacheck output marks each evaluated check with an output key. Only values representing true are considered reported findings.

Parameters:: check (dict)
Return type:: bool

sw_metadata_bot.check_parsing.extract_check_ids(checks)[source]¶

Extract ordered unique pitfall and warning codes from check entries.

Parameters:: checks (list[dict])
Return type:: tuple[list[str], list[str]]

Commit Lookup Module¶

Repository head commit lookup utilities.

sw_metadata_bot.commit_lookup.parse_github_repo(repo_url)[source]¶

Parse owner/repo from a GitHub repository URL.

Parameters:: repo_url (str)
Return type:: tuple[str, str] | None

sw_metadata_bot.commit_lookup.resolve_gitlab_project_path(repo_url)[source]¶

Parse host and project path for GitLab repositories.

Parameters:: repo_url (str)
Return type:: tuple[str, str] | None

sw_metadata_bot.commit_lookup.is_commit_hash(value)[source]¶

Return True if value looks like a commit hash.

Parameters:: value (str)
Return type:: bool

sw_metadata_bot.commit_lookup.get_github_head_commit(repo_url, token=None)[source]¶

Fetch current head commit from GitHub API.

Parameters:

repo_url (str)
token (str | None)

Return type:

str | None

sw_metadata_bot.commit_lookup.get_gitlab_head_commit(repo_url, token=None)[source]¶

Fetch current head commit from GitLab API for gitlab* hosts.

Parameters:

repo_url (str)
token (str | None)

Return type:

str | None

sw_metadata_bot.commit_lookup.get_generic_git_head_commit(repo_url)[source]¶

Fetch current head commit via git ls-remote as generic fallback.

Parameters:: repo_url (str)
Return type:: str | None

sw_metadata_bot.commit_lookup.get_repo_head_commit(repo_url)[source]¶

Fetch current head commit using API-first and git fallback strategies.

Parameters:: repo_url (str)
Return type:: str | None

Config Utils Module¶

RSMetaCheck Wrapper Module¶

Wrapper for rsmetacheck CLI to integrate with sw-metadata-bot.

sw_metadata_bot.rsmetacheck_wrapper.run_rsmetacheck(*, input_source, skip_somef=False, somef_output='somef_outputs', pitfalls_output='pitfalls_outputs', analysis_output='analysis_results.json', threshold=0.8, generate_codemeta=False)[source]¶

Run rsmetacheck CLI by constructing and forwarding argv.

Parameters:

input_source (str)
skip_somef (bool)
somef_output (str)
pitfalls_output (str)
analysis_output (str)
threshold (float)
generate_codemeta (bool)

Return type:

None

GitHub API Module¶

GitHub API client.

class sw_metadata_bot.github_api.GitHubAPI(token=None, dry_run=False)[source]¶

Bases: IssueAPIBase

Simple GitHub API client.

Parameters:

token (str | None)
dry_run (bool)

__init__(token=None, dry_run=False)[source]¶

Initialize GitHub API client.

Parameters:

token (str | None)
dry_run (bool)

static parse_repo_url(url)[source]¶

Parse GitHub URL to extract owner and repo.

Returns:: Tuple of (owner, repo_name)
Parameters:: url (str)
Return type:: tuple[str, str]

check_auth()[source]¶

Check whether authentication works.

Return type:: bool

verify_auth()[source]¶

Verify authentication and return detailed information.

Returns:: Dictionary with authentication details including user, scopes, and permissions.
Return type:: dict

create_issue(repo_url, title, body)[source]¶

Create an issue on GitHub.

Returns:

URL of created issue (or fake URL in dry-run mode)

Parameters:

repo_url (str)
title (str)
body (str)

Return type:

str

static parse_issue_url(issue_url)[source]¶

Parse a GitHub issue URL and return owner/repo/number.

Parameters:: issue_url (str)
Return type:: tuple[str, str, int]

GitLab API Module¶

GitLab API client.

class sw_metadata_bot.gitlab_api.GitLabAPI(token=None, dry_run=False)[source]¶

Bases: IssueAPIBase

Simple GitLab API client.

Parameters:

token (str | None)
dry_run (bool)

__init__(token=None, dry_run=False)[source]¶

Initialize GitLab API client.

Parameters:

token (str | None)
dry_run (bool)

static parse_repo_url(url)[source]¶

Parse GitLab URL to extract host, owner, and repo.

Returns:: Tuple of (host, owner, repo_name)
Parameters:: url (str)
Return type:: tuple[str, str, str]

get_base_url(host)[source]¶

Get API base URL for GitLab host.

Parameters:: host (str)
Return type:: str

check_auth(host='gitlab.com')[source]¶

Check whether authentication works.

Parameters:: host (str)
Return type:: bool

verify_auth(host='gitlab.com')[source]¶

Verify authentication and return detailed information.

Returns:: Dictionary with authentication details including user, scopes, and permissions.
Parameters:: host (str)
Return type:: dict

create_issue(repo_url, title, body)[source]¶

Create an issue on GitLab.

Returns:

URL of created issue (or fake URL in dry-run mode)

Parameters:

repo_url (str)
title (str)
body (str)

Return type:

str

static parse_issue_url(issue_url)[source]¶

Parse a GitLab issue URL and return host/owner/repo/iid.

Parameters:: issue_url (str)
Return type:: tuple[str, str, str, int]

History Module¶

Helpers for loading and querying previous issue reports.

sw_metadata_bot.history.load_previous_report(report_path)[source]¶

Load report.json and index issue-lifecycle entries by repository URL.

Parameters:: report_path (Path | None)
Return type:: dict[str, dict]

sw_metadata_bot.history.load_previous_commit_report(report_path)[source]¶

Load report.json and index entries by repository for commit-based pre-skip.

Parameters:: report_path (Path | None)
Return type:: dict[str, dict]

sw_metadata_bot.history.findings_signature(pitfall_ids, warning_ids)[source]¶

Build a deterministic findings signature from pitfall and warning IDs.

Parameters:

pitfall_ids (list[str] | None)
warning_ids (list[str] | None)

Return type:

str

Incremental Module¶

Decision engine for incremental issue lifecycle handling.

class sw_metadata_bot.incremental.Decision(action, reason)[source]¶

Bases: object

Decision outcome for a repository in incremental mode.

Parameters:

action (str)
reason (str)

action: str¶

reason: str¶

__init__(action, reason)¶

Parameters:

action (str)
reason (str)

Return type:

None

sw_metadata_bot.incremental.evaluate(*, previous_exists, unsubscribed, repo_updated, has_findings, identical_findings, previous_issue_open, codemeta_missing, previous_codemeta_missing)[source]¶

Evaluate the configured decision tree and return action + reason.

This function implements a cascading decision tree that determines whether to create a new issue, update an existing one with a comment, close it, or stop (skip). The logic prioritizes certain conditions to prevent unnecessary noise.

Decision Tree (evaluated in order):

1. NO PREVIOUS ANALYSIS
     action="create" (first-time analysis, always create issue)

2. UNSUBSCRIBE DETECTED
     action="stop" (user explicitly unsubscribed, respect their choice)

3. REPOSITORY NOT UPDATED
     action="stop" (no changes since last analysis, skip)

4. MISSING CODEMETA WITHOUT OTHER FINDINGS
     Check if codemeta status changed:
     - If issue open AND codemeta status unchanged:
         action="stop" (already reported, issue still relevant)
     - If issue open AND codemeta status changed:
         action="comment" (report that codemeta was added/removed)
     - If no issue open:
         action="create" (new codemeta issue)

5. NO FINDINGS (REPO IS CLEAN)
     - If issue is open:
         action="close" (metadata quality improved, close issue)
     - If no issue:
         action="stop" (nothing to report)

6. FINDINGS IDENTICAL TO PREVIOUS
     Check issue state:
     - If issue open:
         action="stop" (same issue already posted)
     - If issue closed:
         action="create" (quality got worse again after improvements)

7. FINDINGS CHANGED (DEFAULT CASE)
     Check issue state:
     - If issue open:
         action="comment" (update existing issue with new findings)
     - If no issue:
         return "create" (quality changed while issue is closed)

Parameters:

previous_exists (bool) – Whether a previous analysis snapshot exists
unsubscribed (bool) – Whether unsubscribe comment detected on existing issue
repo_updated (bool) – Whether repository has new commits since last analysis
has_findings (bool) – Whether current analysis found metadata issues
identical_findings (bool) – Whether findings are identical to previous run
previous_issue_open (bool) – Whether previously opened issue is still open
codemeta_missing (bool) – Whether codemeta.json is missing in current analysis
previous_codemeta_missing (bool) – Whether codemeta.json was missing in previous analysis

Returns:

Decision object with action and reason explaining the choice

Return type:

Decision

Note

For research software: This decision tree is intentionally conservative, favoring skipping unnecessary issues over creating duplicate noise. Changes to this logic should be discussed as they affect user experience.

Main Module¶

CLI entry point for sw-metadata-bot.

sw_metadata_bot.main.main()[source]¶: Entry point for the CLI.

Pipeline Module¶

Pipeline command to run analysis workflows.

sw_metadata_bot.pipeline.find_latest_previous_report(output_root, run_name, current_snapshot_tag)[source]¶

Find latest previous report path from same run folder.

Parameters:

output_root (Path)
run_name (str)
current_snapshot_tag (str | None)

Return type:

Path | None

sw_metadata_bot.pipeline.run_pipeline(config_file, dry_run, snapshot_tag, previous_report, force_analysis=False)[source]¶

Run analysis and write issue decision records without API side effects.

When force_analysis is True, the pipeline will bypass artifact reuse for unchanged repositories and treat them as if the repository was updated.

Parameters:

config_file (Path)
dry_run (bool)
snapshot_tag (str | None)
previous_report (Path | None)
force_analysis (bool)

Return type:

None

Pitfalls Module¶

Pitfalls data loading and parsing.

sw_metadata_bot.pitfalls.load_pitfalls(file_path)[source]¶

Load pitfalls from JSON-LD file.

Parameters:: file_path (Path)
Return type:: dict

sw_metadata_bot.pitfalls.get_repository_url(data)[source]¶

Extract repository URL from pitfalls data.

Parameters:: data (dict)
Return type:: str

sw_metadata_bot.pitfalls.get_pitfalls_list(data)[source]¶

Get list of pitfall checks from data.

Parameters:: data (dict)
Return type:: list[dict]

sw_metadata_bot.pitfalls.get_warnings_list(data)[source]¶

Get list of warning checks from data.

Parameters:: data (dict)
Return type:: list[dict]

sw_metadata_bot.pitfalls.get_rsmetacheck_version(data)[source]¶

Get the version of RSMetacheck used for analysis. Version is in checkingSoftware.softwareVersion Falls back to “unknown” if not found.

Parameters:: data (dict)
Return type:: str

sw_metadata_bot.pitfalls.format_report(repo_url, data, *, codemeta_missing=False, generated_codemeta=None)[source]¶

Format pitfalls data into a readable report.

Parameters:

repo_url (str)
data (dict)
codemeta_missing (bool)
generated_codemeta (dict | None)

Return type:

str

sw_metadata_bot.pitfalls.create_issue_body(report, custom_message=None)[source]¶

Wrap report in issue template using optional custom message or default greetings.

Parameters:

report (str)
custom_message (str | None)

Return type:

str

Publish Module¶

Publish issues from an existing analysis snapshot.

class sw_metadata_bot.publish.FakeIssueClient(comments_for=None)[source]¶

Bases: object

Issue client used only for local publish simulation.

__init__(comments_for=None)[source]¶: Initialize the fake issue client.

create_issue(repo_url, title, body)[source]¶

Create an issue and return a simulated issue URL.

Parameters:

repo_url (str)
title (str)
body (str)

Return type:

str

get_issue(issue_url)[source]¶

return simulated issue data, with state ‘open’ by default (can be overridden by test setup)

Parameters:: issue_url (str)
Return type:: dict[str, object]

get_issue_comments(issue_url)[source]¶

get simulated comments for the issue URL, as provided by the comments_for function

Parameters:: issue_url (str)
Return type:: list[str]

add_issue_comment(issue_url, body)[source]¶

add a comment to the issue URL (recording the action for test verification)

Parameters:

issue_url (str)
body (str)

Return type:

None

close_issue(issue_url)[source]¶

simulate closing the issue at the given URL (recording the action for test verification)

Parameters:: issue_url (str)
Return type:: None

sw_metadata_bot.publish.publish_analysis(analysis_root, retry_failed=False, github_client=None, gitlab_client=None)[source]¶

Publish issues from an existing analysis snapshot without re-running analysis.

Parameters:

analysis_root (Path)
retry_failed (bool)
github_client (GitHubAPI | None)
gitlab_client (GitLabAPI | None)

Return type:

None

Token Resolver Module¶

Token resolution helpers for API clients.

sw_metadata_bot.token_resolver.resolve_token(*, explicit_token, env_var_name, dry_run)[source]¶

Resolve token with precedence: explicit > env > .env fallback.

Parameters:

explicit_token (str | None)
env_var_name (str)
dry_run (bool)

Return type:

str | None

Verify Tokens Module¶

Token verification command - check if tokens have correct permissions.