API Reference

This section documents all modules and functions in the sw-metadata-bot package.

Analysis Runtime Module

Low-level analysis workflow helpers for pipeline orchestration.

class sw_metadata_bot.analysis_runtime.CurrentAnalysisContext(repo_url, pitfall_file, data, pitfalls_count, warnings_count, pitfalls_ids, warnings_ids, analysis_date, rsmetacheck_version, findings_signature, has_findings, codemeta_status, codemeta_missing, codemeta_generated, generated_codemeta)[source]

Bases: object

Parsed current-analysis state needed to build a decision record.

Parameters:
  • repo_url (str)

  • pitfall_file (Path)

  • data (dict[str, Any])

  • pitfalls_count (int)

  • warnings_count (int)

  • pitfalls_ids (list[str])

  • warnings_ids (list[str])

  • analysis_date (str)

  • rsmetacheck_version (str)

  • findings_signature (str)

  • has_findings (bool)

  • codemeta_status (str)

  • codemeta_missing (bool)

  • codemeta_generated (bool)

  • generated_codemeta (dict[str, Any] | None)

repo_url: str
pitfall_file: Path
data: dict[str, Any]
pitfalls_count: int
warnings_count: int
pitfalls_ids: list[str]
warnings_ids: list[str]
analysis_date: str
rsmetacheck_version: str
findings_signature: str
has_findings: bool
codemeta_status: str
codemeta_missing: bool
codemeta_generated: bool
generated_codemeta: dict[str, Any] | None
__init__(repo_url, pitfall_file, data, pitfalls_count, warnings_count, pitfalls_ids, warnings_ids, analysis_date, rsmetacheck_version, findings_signature, has_findings, codemeta_status, codemeta_missing, codemeta_generated, generated_codemeta)
Parameters:
  • repo_url (str)

  • pitfall_file (Path)

  • data (dict[str, Any])

  • pitfalls_count (int)

  • warnings_count (int)

  • pitfalls_ids (list[str])

  • warnings_ids (list[str])

  • analysis_date (str)

  • rsmetacheck_version (str)

  • findings_signature (str)

  • has_findings (bool)

  • codemeta_status (str)

  • codemeta_missing (bool)

  • codemeta_generated (bool)

  • generated_codemeta (dict[str, Any] | None)

Return type:

None

class sw_metadata_bot.analysis_runtime.PreviousAnalysisContext(previous_exists, previous_issue_url, previous_issue_state, previous_commit_id, previous_signature, previous_issue_open, previous_codemeta_missing, repo_updated)[source]

Bases: object

Previous-analysis state needed for incremental decision making.

Parameters:
  • previous_exists (bool)

  • previous_issue_url (str | None)

  • previous_issue_state (str | None)

  • previous_commit_id (str | None)

  • previous_signature (str)

  • previous_issue_open (bool)

  • previous_codemeta_missing (bool)

  • repo_updated (bool)

previous_exists: bool
previous_issue_url: str | None
previous_issue_state: str | None
previous_commit_id: str | None
previous_signature: str
previous_issue_open: bool
previous_codemeta_missing: bool
repo_updated: bool
__init__(previous_exists, previous_issue_url, previous_issue_state, previous_commit_id, previous_signature, previous_issue_open, previous_codemeta_missing, repo_updated)
Parameters:
  • previous_exists (bool)

  • previous_issue_url (str | None)

  • previous_issue_state (str | None)

  • previous_commit_id (str | None)

  • previous_signature (str)

  • previous_issue_open (bool)

  • previous_codemeta_missing (bool)

  • repo_updated (bool)

Return type:

None

sw_metadata_bot.analysis_runtime.extract_previous_commit(record)[source]

Return previous commit id from report records with compatibility fallback.

Parameters:

record (dict)

Return type:

str | None

sw_metadata_bot.analysis_runtime.resolve_per_repo_paths(analysis_root, repo_url)[source]

Compute per-repository output paths within the analysis root.

Parameters:
  • analysis_root (Path)

  • repo_url (str)

Return type:

dict[str, Path]

sw_metadata_bot.analysis_runtime.copy_previous_repo_artifacts(previous_repo_folder, current_repo_folder)[source]

Copy previous snapshot repository artifacts into current snapshot folder.

Parameters:
  • previous_repo_folder (Path)

  • current_repo_folder (Path)

Return type:

None

sw_metadata_bot.analysis_runtime.load_previous_repo_record(previous_snapshot_root, repo_url)[source]

Load previous per-repo record from previous snapshot if available.

Parameters:
  • previous_snapshot_root (Path | None)

  • repo_url (str)

Return type:

dict | None

sw_metadata_bot.analysis_runtime.standardize_metacheck_outputs(repo_folder)[source]

Normalize metacheck output names to stable per-repo filenames.

RSMetacheck outputs multiple artifacts with varying names depending on tool version and configuration. This function consolidates them into a standard naming scheme for consistent downstream processing.

Normalization Strategy (for research software clarity):

  • Pitfalls (JSON-LD): Often named with repository name or timestamp.

    Standardized to pitfall.jsonld.

  • SOMEF output: Can be nested in subdirectories or root.

    Standardized to somef_output.json.

  • Generated codemeta: Created by rsmetacheck if metadata is missing.

    Standardized to codemeta_generated.json.

File Discovery Uses Fallback Strategy: 1. Try explicit subdirectory (metacheck’s preferred location) 2. Fall back to glob patterns if subdirectory empty 3. Apply heuristics (payload inspection) to disambiguate similar files

This defensive approach handles different metacheck versions gracefully without failing when directory structure differs from expectations.

Parameters:

repo_folder (Path)

Return type:

None

sw_metadata_bot.analysis_runtime.run_metacheck_for_repo(repo_url, repo_folder, *, generate_codemeta_if_missing)[source]

Run metacheck for a single repository URL into its own folder.

Parameters:
  • repo_url (str)

  • repo_folder (Path)

  • generate_codemeta_if_missing (bool)

Return type:

None

sw_metadata_bot.analysis_runtime.build_analysis_counters(records)[source]

Build analysis counters using the unified report schema.

Parameters:

records (list[dict[str, object]])

Return type:

dict[str, int]

sw_metadata_bot.analysis_runtime.build_analysis_run_report(records, *, dry_run, run_root, analysis_summary_file, previous_report)[source]

Build run-level report payload from analysis decision records.

Parameters:
Return type:

dict[str, object]

sw_metadata_bot.analysis_runtime.detect_repo_platform(repo_url)[source]

Detect publish platform from a repository URL.

Parameters:

repo_url (str)

Return type:

str | None

sw_metadata_bot.analysis_runtime.is_previous_issue_open(previous_record)[source]

Infer whether previous issue was open from stored metadata only.

Parameters:

previous_record (dict[str, object])

Return type:

bool

sw_metadata_bot.analysis_runtime.build_record_entry(*, run_root, repo_url, platform, pitfalls_count, warnings_count, analysis_date, rsmetacheck_version, pitfalls_ids, warnings_ids, action, reason_code, findings_signature, current_commit_id, previous_commit_id, previous_issue_url, previous_issue_state, dry_run, issue_persistence, issue_url, file_path, codemeta_generated=None, codemeta_status=None, error=None)[source]

Build a per-repository analysis record payload.

Parameters:
  • run_root (Path)

  • repo_url (str)

  • platform (str | None)

  • pitfalls_count (int)

  • warnings_count (int)

  • analysis_date (str)

  • rsmetacheck_version (str)

  • pitfalls_ids (list[str])

  • warnings_ids (list[str])

  • action (str)

  • reason_code (str)

  • findings_signature (str)

  • current_commit_id (str | None)

  • previous_commit_id (str | None)

  • previous_issue_url (str | None)

  • previous_issue_state (str | None)

  • dry_run (bool)

  • issue_persistence (str)

  • issue_url (str | None)

  • file_path (Path)

  • codemeta_generated (bool | None)

  • codemeta_status (str | None)

  • error (str | None)

Return type:

dict[str, object]

sw_metadata_bot.analysis_runtime.write_analysis_repo_report(repo_folder, record, *, dry_run, run_root, analysis_summary_file, previous_report)[source]

Write per-repository analysis report using analysis-stage counters.

Parameters:
Return type:

None

sw_metadata_bot.analysis_runtime.create_analysis_record(*, run_root, repo_url, repo_folder, previous_record, current_commit_id, dry_run, custom_message, force_analysis=False)[source]

Create a decision record for a repository without platform API calls.

Parameters:
  • run_root (Path)

  • repo_url (str)

  • repo_folder (Path)

  • previous_record (dict[str, object] | None)

  • current_commit_id (str | None)

  • dry_run (bool)

  • custom_message (str | None)

  • force_analysis (bool)

Return type:

dict[str, object]

Check Parsing Module

Shared parsing helpers for RSMetacheck check identifiers.

RSMetacheck evaluates each repository against a catalog of checks for metadata quality. Checks are identified by a code: P#### for Pitfalls (high-priority issues that indicate missing or invalid metadata) or W#### for Warnings (informational checks or best-practice recommendations). The #### segment is a 3-4 digit code within each category.

Example check codes: - P001: Repository lacks codemeta.json file - W001: Incomplete metadata field descriptions - P042: Missing license information

See constants.py for related definitions (CHECK_TYPE_*, CHECK_CODE_REGEX_PATTERN).

sw_metadata_bot.check_parsing.get_check_catalog_id(check)[source]

Return full RSMetacheck catalog ID URL for a check when available.

Preferred source is the new schema key assessesIndicator.@id when it points to the RSMetacheck catalog. For backward compatibility, this falls back to the legacy pitfall key.

Parameters:

check (dict)

Return type:

str

sw_metadata_bot.check_parsing.get_short_check_code(check)[source]

Return short check code such as P001 or W004.

Parameters:

check (dict)

Return type:

str

sw_metadata_bot.check_parsing.is_check_reported(check)[source]

Return True only when a check is explicitly reported by metacheck.

Verbose metacheck output marks each evaluated check with an output key. Only values representing true are considered reported findings.

Parameters:

check (dict)

Return type:

bool

sw_metadata_bot.check_parsing.extract_check_ids(checks)[source]

Extract ordered unique pitfall and warning codes from check entries.

Parameters:

checks (list[dict])

Return type:

tuple[list[str], list[str]]

Commit Lookup Module

Repository head commit lookup utilities.

sw_metadata_bot.commit_lookup.parse_github_repo(repo_url)[source]

Parse owner/repo from a GitHub repository URL.

Parameters:

repo_url (str)

Return type:

tuple[str, str] | None

sw_metadata_bot.commit_lookup.resolve_gitlab_project_path(repo_url)[source]

Parse host and project path for GitLab repositories.

Parameters:

repo_url (str)

Return type:

tuple[str, str] | None

sw_metadata_bot.commit_lookup.is_commit_hash(value)[source]

Return True if value looks like a commit hash.

Parameters:

value (str)

Return type:

bool

sw_metadata_bot.commit_lookup.get_github_head_commit(repo_url, token=None)[source]

Fetch current head commit from GitHub API.

Parameters:
  • repo_url (str)

  • token (str | None)

Return type:

str | None

sw_metadata_bot.commit_lookup.get_gitlab_head_commit(repo_url, token=None)[source]

Fetch current head commit from GitLab API for gitlab* hosts.

Parameters:
  • repo_url (str)

  • token (str | None)

Return type:

str | None

sw_metadata_bot.commit_lookup.get_generic_git_head_commit(repo_url)[source]

Fetch current head commit via git ls-remote as generic fallback.

Parameters:

repo_url (str)

Return type:

str | None

sw_metadata_bot.commit_lookup.get_repo_head_commit(repo_url)[source]

Fetch current head commit using API-first and git fallback strategies.

Parameters:

repo_url (str)

Return type:

str | None

Config Utils Module

Helpers for the unified configuration file.

sw_metadata_bot.config_utils.normalize_repo_url(url)[source]

Normalize repository URLs for matching and persistence.

Parameters:

url (str)

Return type:

str

sw_metadata_bot.config_utils.detect_platform(url)[source]

Detect publishing platform from repository URL.

Returns "github" for GitHub URLs, "gitlab" for any GitLab URL, or None when the URL does not match a known platform.

Parameters:

url (str)

Return type:

str | None

sw_metadata_bot.config_utils.load_config(config_path)[source]

Load and validate a unified configuration file.

Parameters:

config_path (Path)

Return type:

dict

sw_metadata_bot.config_utils.get_repositories(config)[source]

Return normalized repositories preserving order and uniqueness.

Parameters:

config (dict)

Return type:

list[str]

sw_metadata_bot.config_utils.get_custom_message(config)[source]

Return the configured issue custom message if present.

Parameters:

config (dict)

Return type:

str | None

sw_metadata_bot.config_utils.get_opt_out_repositories(config)[source]

Return normalized repository URLs configured as inline opt-outs.

Parameters:

config (dict)

Return type:

set[str]

sw_metadata_bot.config_utils.get_generate_codemeta_if_missing(config)[source]

Return whether codemeta suggestions should be generated when missing.

Parameters:

config (dict)

Return type:

bool

sw_metadata_bot.config_utils.append_opt_out_repository(config_path, repo_url)[source]

Persist a repository to the inline opt-outs list when not already present.

Parameters:
  • config_path (Path)

  • repo_url (str)

Return type:

bool

sw_metadata_bot.config_utils.resolve_output_root(config, config_path)[source]

Return the configured output root, resolving relative paths from project root.

Parameters:
Return type:

Path

sw_metadata_bot.config_utils.resolve_run_name(config, config_path)[source]

Return the configured run name or a sensible default.

Parameters:
Return type:

str

sw_metadata_bot.config_utils.resolve_snapshot_tag(config, explicit_snapshot_tag)[source]

Resolve the snapshot tag from CLI override or config defaults.

Parameters:
  • config (dict)

  • explicit_snapshot_tag (str | None)

Return type:

str | None

sw_metadata_bot.config_utils.sanitize_repo_name(repo_url)[source]

Sanitize repository URL to a safe folder name format.

Uses a generic URL-safe transformation so non-standard URLs still map to deterministic folder names.

Parameters:

repo_url (str) – Repository URL or identifier string

Returns:

Sanitized folder name (lowercase, underscores only)

Return type:

str

sw_metadata_bot.config_utils.copy_config_to_analysis_root(config_path, analysis_root)[source]

Copy the configuration file to the analysis root directory.

Parameters:
  • config_path (Path) – Path to the input configuration file

  • analysis_root (Path) – Root analysis directory where config will be copied

Raises:

IOError – If copying fails

Return type:

None

RSMetaCheck Wrapper Module

Wrapper for rsmetacheck CLI to integrate with sw-metadata-bot.

sw_metadata_bot.rsmetacheck_wrapper.run_rsmetacheck(*, input_source, skip_somef=False, somef_output='somef_outputs', pitfalls_output='pitfalls_outputs', analysis_output='analysis_results.json', threshold=0.8, generate_codemeta=False)[source]

Run rsmetacheck CLI by constructing and forwarding argv.

Parameters:
  • input_source (str)

  • skip_somef (bool)

  • somef_output (str)

  • pitfalls_output (str)

  • analysis_output (str)

  • threshold (float)

  • generate_codemeta (bool)

Return type:

None

GitHub API Module

GitHub API client.

class sw_metadata_bot.github_api.GitHubAPI(token=None, dry_run=False)[source]

Bases: IssueAPIBase

Simple GitHub API client.

Parameters:
  • token (str | None)

  • dry_run (bool)

__init__(token=None, dry_run=False)[source]

Initialize GitHub API client.

Parameters:
  • token (str | None)

  • dry_run (bool)

static parse_repo_url(url)[source]

Parse GitHub URL to extract owner and repo.

Returns:

Tuple of (owner, repo_name)

Parameters:

url (str)

Return type:

tuple[str, str]

check_auth()[source]

Check whether authentication works.

Return type:

bool

verify_auth()[source]

Verify authentication and return detailed information.

Returns:

Dictionary with authentication details including user, scopes, and permissions.

Return type:

dict

create_issue(repo_url, title, body)[source]

Create an issue on GitHub.

Returns:

URL of created issue (or fake URL in dry-run mode)

Parameters:
Return type:

str

static parse_issue_url(issue_url)[source]

Parse a GitHub issue URL and return owner/repo/number.

Parameters:

issue_url (str)

Return type:

tuple[str, str, int]

GitLab API Module

GitLab API client.

class sw_metadata_bot.gitlab_api.GitLabAPI(token=None, dry_run=False)[source]

Bases: IssueAPIBase

Simple GitLab API client.

Parameters:
  • token (str | None)

  • dry_run (bool)

__init__(token=None, dry_run=False)[source]

Initialize GitLab API client.

Parameters:
  • token (str | None)

  • dry_run (bool)

static parse_repo_url(url)[source]

Parse GitLab URL to extract host, owner, and repo.

Returns:

Tuple of (host, owner, repo_name)

Parameters:

url (str)

Return type:

tuple[str, str, str]

get_base_url(host)[source]

Get API base URL for GitLab host.

Parameters:

host (str)

Return type:

str

check_auth(host='gitlab.com')[source]

Check whether authentication works.

Parameters:

host (str)

Return type:

bool

verify_auth(host='gitlab.com')[source]

Verify authentication and return detailed information.

Returns:

Dictionary with authentication details including user, scopes, and permissions.

Parameters:

host (str)

Return type:

dict

create_issue(repo_url, title, body)[source]

Create an issue on GitLab.

Returns:

URL of created issue (or fake URL in dry-run mode)

Parameters:
Return type:

str

static parse_issue_url(issue_url)[source]

Parse a GitLab issue URL and return host/owner/repo/iid.

Parameters:

issue_url (str)

Return type:

tuple[str, str, str, int]

History Module

Helpers for loading and querying previous issue reports.

sw_metadata_bot.history.load_previous_report(report_path)[source]

Load report.json and index issue-lifecycle entries by repository URL.

Parameters:

report_path (Path | None)

Return type:

dict[str, dict]

sw_metadata_bot.history.load_previous_commit_report(report_path)[source]

Load report.json and index entries by repository for commit-based pre-skip.

Parameters:

report_path (Path | None)

Return type:

dict[str, dict]

sw_metadata_bot.history.findings_signature(pitfall_ids, warning_ids)[source]

Build a deterministic findings signature from pitfall and warning IDs.

Parameters:
Return type:

str

Incremental Module

Decision engine for incremental issue lifecycle handling.

class sw_metadata_bot.incremental.Decision(action, reason)[source]

Bases: object

Decision outcome for a repository in incremental mode.

Parameters:
action: str
reason: str
__init__(action, reason)
Parameters:
Return type:

None

sw_metadata_bot.incremental.evaluate(*, previous_exists, unsubscribed, repo_updated, has_findings, identical_findings, previous_issue_open, codemeta_missing, previous_codemeta_missing)[source]

Evaluate the configured decision tree and return action + reason.

This function implements a cascading decision tree that determines whether to create a new issue, update an existing one with a comment, close it, or stop (skip). The logic prioritizes certain conditions to prevent unnecessary noise.

Decision Tree (evaluated in order):

1. NO PREVIOUS ANALYSIS
     action="create" (first-time analysis, always create issue)

2. UNSUBSCRIBE DETECTED
     action="stop" (user explicitly unsubscribed, respect their choice)

3. REPOSITORY NOT UPDATED
     action="stop" (no changes since last analysis, skip)

4. MISSING CODEMETA WITHOUT OTHER FINDINGS
     Check if codemeta status changed:
     - If issue open AND codemeta status unchanged:
         action="stop" (already reported, issue still relevant)
     - If issue open AND codemeta status changed:
         action="comment" (report that codemeta was added/removed)
     - If no issue open:
         action="create" (new codemeta issue)

5. NO FINDINGS (REPO IS CLEAN)
     - If issue is open:
         action="close" (metadata quality improved, close issue)
     - If no issue:
         action="stop" (nothing to report)

6. FINDINGS IDENTICAL TO PREVIOUS
     Check issue state:
     - If issue open:
         action="stop" (same issue already posted)
     - If issue closed:
         action="create" (quality got worse again after improvements)

7. FINDINGS CHANGED (DEFAULT CASE)
     Check issue state:
     - If issue open:
         action="comment" (update existing issue with new findings)
     - If no issue:
         return "create" (quality changed while issue is closed)
Parameters:
  • previous_exists (bool) – Whether a previous analysis snapshot exists

  • unsubscribed (bool) – Whether unsubscribe comment detected on existing issue

  • repo_updated (bool) – Whether repository has new commits since last analysis

  • has_findings (bool) – Whether current analysis found metadata issues

  • identical_findings (bool) – Whether findings are identical to previous run

  • previous_issue_open (bool) – Whether previously opened issue is still open

  • codemeta_missing (bool) – Whether codemeta.json is missing in current analysis

  • previous_codemeta_missing (bool) – Whether codemeta.json was missing in previous analysis

Returns:

Decision object with action and reason explaining the choice

Return type:

Decision

Note

For research software: This decision tree is intentionally conservative, favoring skipping unnecessary issues over creating duplicate noise. Changes to this logic should be discussed as they affect user experience.

Main Module

CLI entry point for sw-metadata-bot.

sw_metadata_bot.main.main()[source]

Entry point for the CLI.

Pipeline Module

Pipeline command to run analysis workflows.

sw_metadata_bot.pipeline.find_latest_previous_report(output_root, run_name, current_snapshot_tag)[source]

Find latest previous report path from same run folder.

Parameters:
  • output_root (Path)

  • run_name (str)

  • current_snapshot_tag (str | None)

Return type:

Path | None

sw_metadata_bot.pipeline.run_pipeline(config_file, dry_run, snapshot_tag, previous_report, force_analysis=False)[source]

Run analysis and write issue decision records without API side effects.

When force_analysis is True, the pipeline will bypass artifact reuse for unchanged repositories and treat them as if the repository was updated.

Parameters:
  • config_file (Path)

  • dry_run (bool)

  • snapshot_tag (str | None)

  • previous_report (Path | None)

  • force_analysis (bool)

Return type:

None

Pitfalls Module

Pitfalls data loading and parsing.

sw_metadata_bot.pitfalls.load_pitfalls(file_path)[source]

Load pitfalls from JSON-LD file.

Parameters:

file_path (Path)

Return type:

dict

sw_metadata_bot.pitfalls.get_repository_url(data)[source]

Extract repository URL from pitfalls data.

Parameters:

data (dict)

Return type:

str

sw_metadata_bot.pitfalls.get_pitfalls_list(data)[source]

Get list of pitfall checks from data.

Parameters:

data (dict)

Return type:

list[dict]

sw_metadata_bot.pitfalls.get_warnings_list(data)[source]

Get list of warning checks from data.

Parameters:

data (dict)

Return type:

list[dict]

sw_metadata_bot.pitfalls.get_rsmetacheck_version(data)[source]

Get the version of RSMetacheck used for analysis. Version is in checkingSoftware.softwareVersion Falls back to “unknown” if not found.

Parameters:

data (dict)

Return type:

str

sw_metadata_bot.pitfalls.format_report(repo_url, data, *, codemeta_missing=False, generated_codemeta=None)[source]

Format pitfalls data into a readable report.

Parameters:
  • repo_url (str)

  • data (dict)

  • codemeta_missing (bool)

  • generated_codemeta (dict | None)

Return type:

str

sw_metadata_bot.pitfalls.create_issue_body(report, custom_message=None)[source]

Wrap report in issue template using optional custom message or default greetings.

Parameters:
  • report (str)

  • custom_message (str | None)

Return type:

str

Publish Module

Publish issues from an existing analysis snapshot.

sw_metadata_bot.publish.publish_analysis(analysis_root, retry_failed=False)[source]

Publish issues from an existing analysis snapshot without re-running analysis.

Parameters:
  • analysis_root (Path)

  • retry_failed (bool)

Return type:

None

Token Resolver Module

Token resolution helpers for API clients.

sw_metadata_bot.token_resolver.resolve_token(*, explicit_token, env_var_name, dry_run)[source]

Resolve token with precedence: explicit > env > .env fallback.

Parameters:
  • explicit_token (str | None)

  • env_var_name (str)

  • dry_run (bool)

Return type:

str | None

Verify Tokens Module

Token verification command - check if tokens have correct permissions.