Files
mcma-backend/app/domain/entities/metadata.py
T
Senko-san c72d19599a
Docker Build & Publish / push (push) Has been cancelled
Docker Build & Publish / Prune old image versions (push) Has been cancelled
Docker Build & Publish / build (push) Failing after 10m8s
feat(enrichment): tag-first metadata pipeline (§1D)
Implements the §6.2 enrichment pipeline: embedded tags → Chromaprint
fingerprint → AcoustID lookup. Well-tagged files get correct
artist/album/title offline; the rest are identified via AcoustID
(which also yields a MusicBrainz recording id in one call).

- domain: AudioTags/Fingerprint/RecordingMatch value objects; ports
  AudioTagReader, AudioFingerprinter, AcoustIdClient; TrackRepository
  .apply_enrichment (gap-fill, never erases) + AlbumRepository.get_or_create
- infrastructure/metadata: MutagenTagReader, FpcalcFingerprinter,
  AcoustIdHttpClient (rich meta=recordings+releasegroups, throttled)
- application: MetadataEnrichmentService — tags preferred, AcoustID fills
  gaps; resolves artist/album; status enriched/failed; skips manual;
  every external step wrapped (graceful degradation)
- workers: enrich_task registered; enqueue_enrich is best-effort and
  deferred so the caller's txn commits before the worker reads the row
- wiring: upload enqueues after add; import returns imported_ids and
  enqueues post-commit (mid-scan would race the worker); manual
  POST /tracks/{id}/metadata/enrich endpoint
- deps: add mutagen (fpcalc/ffmpeg already in the image)

Tests: metadata service orchestration, AcoustID parser, tag helpers.
125 passed; mypy strict + ruff clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 13:04:02 +03:00

54 lines
1.6 KiB
Python

"""Value objects for the metadata-enrichment pipeline (plan §6.2).
Pure data carriers between the enrichment service and its adapters (tag reader,
fingerprinter, AcoustID). No framework imports — these cross the domain boundary.
"""
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class AudioTags:
"""Embedded tags read from the file itself (ID3 / Vorbis / MP4 …).
Every field is optional — files are tagged inconsistently. The reader fills
what it can and leaves the rest ``None`` for downstream identification.
"""
title: str | None = None
artist: str | None = None
album: str | None = None
album_artist: str | None = None
genre: str | None = None
year: int | None = None
track_number: int | None = None
duration_seconds: int | None = None
bitrate: int | None = None
@dataclass(frozen=True, slots=True)
class Fingerprint:
"""Chromaprint fingerprint plus the decoded duration (both needed by AcoustID)."""
fingerprint: str
duration_seconds: int
@dataclass(frozen=True, slots=True)
class RecordingMatch:
"""A single AcoustID result, flattened to the fields enrichment cares about.
``acoustid`` is the stable AcoustID identifier (a UUID) — used as the
dedup key persisted on ``track.acoustid_fingerprint`` (fits the 64-char
column; the raw fingerprint does not). ``recording_mbid`` is the MusicBrainz
recording id when present.
"""
acoustid: str
score: float
recording_mbid: str | None = None
title: str | None = None
artist: str | None = None
album: str | None = None
year: int | None = None