gmuse.git#

Git utilities for extracting repository information.

This module provides functions to interact with git repositories using subprocess to extract staged diffs, commit history, and repository metadata.

Public API:
  • is_git_repository: Check if directory is a git repo

  • get_repo_root: Get repository root path

  • get_staged_diff: Extract staged changes

  • get_commit_history: Fetch recent commits

  • get_current_branch: Get current branch information

  • truncate_diff: Truncate large diffs

  • load_repository_instructions: Load .gmuse file

Data Classes:
  • StagedDiff: Staged changes information

  • CommitRecord: Single commit data

  • CommitHistory: Collection of commits

  • RepositoryInstructions: Content from .gmuse file

  • BranchInfo: Current branch information

Module Contents#

Classes#

StagedDiff

Represents the git diff of staged changes.

CommitRecord

Single commit from repository history.

CommitHistory

Collection of recent commits for style context.

RepositoryInstructions

Project-level guidance from .gmuse file.

BranchInfo

Information about the current git branch.

Functions#

_run_git

Execute a git command and return the result.

_count_diff_lines

Count added and removed lines in a diff.

_parse_commit_line

Parse a single commit log line into a CommitRecord.

_sanitize_branch_name

Sanitize branch name for prompt context.

_parse_branch_info

Parse branch name into type and summary.

is_git_repository

Check if the current/specified directory is a git repository.

get_repo_root

Get the root directory of the git repository.

get_staged_diff

Extract staged changes from git repository.

get_commit_history

Fetch recent commit messages for style context.

truncate_diff

Truncate diff to fit within token limits.

load_repository_instructions

Load project-level instructions from .gmuse file.

get_current_branch

Get information about the current git branch.

Data#

logger

_GIT_TIMEOUT_SHORT

Timeout for quick git operations like rev-parse (seconds).

_GIT_TIMEOUT_LONG

Timeout for potentially slow git operations like diff/log (seconds).

API#

gmuse.git.logger = 'get_logger(...)'#
gmuse.git._GIT_TIMEOUT_SHORT: Final[int] = 5#

Timeout for quick git operations like rev-parse (seconds).

gmuse.git._GIT_TIMEOUT_LONG: Final[int] = 30#

Timeout for potentially slow git operations like diff/log (seconds).

class gmuse.git.StagedDiff#

Represents the git diff of staged changes.

This dataclass encapsulates all information about staged changes in a git repository, including the raw diff content and computed metadata.

Attributes:

raw_diff: Full output of git diff –cached files_changed: List of file paths that were modified lines_added: Total lines added across all files lines_removed: Total lines removed across all files hash: SHA256 hash of raw_diff (useful for caching/deduplication) size_bytes: Size of raw_diff in bytes truncated: Whether diff was truncated to fit token limits

raw_diff: str = None#
files_changed: list[str] = None#
lines_added: int = None#
lines_removed: int = None#
hash: str = None#
size_bytes: int = None#
truncated: bool = False#
class gmuse.git.CommitRecord#

Single commit from repository history.

Attributes:

hash: Full git commit SHA (40 characters) message: Commit message subject line author: Commit author name timestamp: Commit timestamp as datetime object

hash: str = None#
message: str = None#
author: str = None#
timestamp: datetime.datetime = None#
class gmuse.git.CommitHistory#

Collection of recent commits for style context.

Used to provide the LLM with examples of the repository’s commit style to help generate consistent messages.

Attributes:

commits: Ordered list of recent commits (newest first) depth: Number of commits requested (may differ from len(commits)) repository_path: Absolute path to git repository root

commits: list[gmuse.git.CommitRecord] = None#
depth: int = None#
repository_path: str = None#
class gmuse.git.RepositoryInstructions#

Project-level guidance from .gmuse file.

The .gmuse file allows repository maintainers to provide custom guidance for commit message generation.

Attributes:

content: Raw text content from .gmuse file (empty if not found) file_path: Absolute path to .gmuse file exists: Whether .gmuse file was found in the repository

content: str = None#
file_path: str = None#
exists: bool = None#
class gmuse.git.BranchInfo#

Information about the current git branch.

Used to provide context about the branch when generating commit messages. Branch names are sanitized to protect privacy and improve LLM understanding.

Attributes:

raw_name: Original branch name from git branch_type: Extracted branch type (e.g., ‘feature’, ‘fix’, ‘hotfix’) branch_summary: Sanitized summary of branch purpose (truncated, tickets masked) is_default: Whether this is a default branch (main, master, develop)

raw_name: str = None#
branch_type: Optional[str] = None#
branch_summary: Optional[str] = None#
is_default: bool = False#
gmuse.git._run_git(*args: str, cwd: Optional[str] = None, timeout: int = _GIT_TIMEOUT_SHORT, check: bool = True) subprocess.CompletedProcess[str]#

Execute a git command and return the result.

This is the core helper used by all git operations in this module.

Args:

*args: Git command arguments (without ‘git’ prefix) cwd: Working directory for the command (None = current directory) timeout: Command timeout in seconds check: If True, raise CalledProcessError on non-zero exit code

Returns:

CompletedProcess with captured stdout/stderr

Raises:

subprocess.CalledProcessError: If check=True and command fails subprocess.TimeoutExpired: If command exceeds timeout FileNotFoundError: If git executable is not found

gmuse.git._count_diff_lines(raw_diff: str) tuple[int, int]#

Count added and removed lines in a diff.

Args:

raw_diff: Raw git diff output

Returns:

Tuple of (lines_added, lines_removed)

gmuse.git._parse_commit_line(line: str) Optional[gmuse.git.CommitRecord]#

Parse a single commit log line into a CommitRecord.

Expected format: hash|author|timestamp|message

Args:

line: Raw log line from git

Returns:

CommitRecord if parsing succeeds, None otherwise

gmuse.git._sanitize_branch_name(branch_name: str, max_length: int = 60) str#

Sanitize branch name for prompt context.

Normalizes separators, converts to lowercase, removes usernames and long hashes. Masks potential ticket IDs (e.g., JIRA-123 -> TICKET-XXX).

Args:

branch_name: Raw branch name from git max_length: Maximum length for sanitized name (default: 60)

Returns:

Sanitized branch name suitable for LLM context

Example:
>>> _sanitize_branch_name("feature/USER-123-add-auth")
'feature/ticket-xxx/add-auth'
>>> _sanitize_branch_name("fix/PROJ-456/update-api")
'fix/ticket-xxx/update-api'
gmuse.git._parse_branch_info(branch_name: str, max_length: int = 60) tuple[Optional[str], Optional[str]]#

Parse branch name into type and summary.

Extracts branch type (feature, fix, hotfix, etc.) and summary from common branch naming patterns like ‘type/description’ or ‘type-description’.

Args:

branch_name: Raw branch name from git max_length: Maximum length for branch summary (default: 60)

Returns:

Tuple of (branch_type, branch_summary), both can be None

Example:
>>> _parse_branch_info("feature/add-authentication")
('feature', 'add-authentication')
>>> _parse_branch_info("fix/PROJ-123-bug-in-api")
('fix', 'ticket-xxx-bug-in-api')
gmuse.git.is_git_repository(path: Optional[pathlib.Path] = None) bool#

Check if the current/specified directory is a git repository.

Args:

path: Directory to check, defaults to current directory

Returns:

True if directory is a git repository, False otherwise

Example:
>>> is_git_repository()
True
>>> is_git_repository(Path("/tmp"))
False
gmuse.git.get_repo_root(path: Optional[pathlib.Path] = None) pathlib.Path#

Get the root directory of the git repository.

Args:

path: Directory to start from, defaults to current directory

Returns:

Path to repository root

Raises:

NotAGitRepositoryError: If not in a git repository

Example:
>>> root = get_repo_root()
>>> print(root)
/home/user/my-project
gmuse.git.get_staged_diff() gmuse.git.StagedDiff#

Extract staged changes from git repository.

Returns:

StagedDiff object with diff content and metadata

Raises:

NotAGitRepositoryError: If not in a git repository NoStagedChangesError: If no files are staged

Example:
>>> diff = get_staged_diff()
>>> print(diff.files_changed)
['src/main.py', 'tests/test_main.py']
gmuse.git.get_commit_history(depth: int = 5) gmuse.git.CommitHistory#

Fetch recent commit messages for style context.

Args:

depth: Number of commits to fetch (default: 5)

Returns:

CommitHistory object with recent commits

Raises:

NotAGitRepositoryError: If not in a git repository

Example:
>>> history = get_commit_history(depth=10)
>>> for commit in history.commits:
...     print(commit.message)
gmuse.git.truncate_diff(diff: gmuse.git.StagedDiff, max_bytes: int = 20000) gmuse.git.StagedDiff#

Truncate diff to fit within token limits.

Strategy: - Keep file headers (diff –git, —, +++) - Keep as many lines as possible within byte limit - Add truncation marker when limit is reached - Preserve structure for LLM understanding

Args:

diff: StagedDiff to truncate max_bytes: Maximum size in bytes (default: 20000 ≈ 5000 tokens)

Returns:

New StagedDiff with truncated content and truncated=True

Example:
>>> large_diff = get_staged_diff()
>>> truncated = truncate_diff(large_diff, max_bytes=10000)
>>> print(truncated.truncated)
True
gmuse.git.load_repository_instructions() gmuse.git.RepositoryInstructions#

Load project-level instructions from .gmuse file.

The .gmuse file allows repository maintainers to provide custom guidance for commit message generation (e.g., preferred formats, conventions).

Returns:

RepositoryInstructions object with content from .gmuse file

Raises:

NotAGitRepositoryError: If not in a git repository

Example:
>>> instructions = load_repository_instructions()
>>> if instructions.exists:
...     print(instructions.content)
gmuse.git.get_current_branch(max_length: int = 60) Optional[gmuse.git.BranchInfo]#

Get information about the current git branch.

Extracts the current branch name and parses it into structured information for use as context in commit message generation. Returns None if the repository is in a detached HEAD state or if branch detection fails.

Args:

max_length: Maximum length for branch summary (default: 60)

Returns:

BranchInfo object with parsed branch information, or None if unavailable

Raises:

NotAGitRepositoryError: If not in a git repository

Example:
>>> branch = get_current_branch()
>>> if branch and not branch.is_default:
...     print(f"Type: {branch.branch_type}, Summary: {branch.branch_summary}")