feat(enrichment): tag-first metadata pipeline (§1D)
Docker Build & Publish / push (push) Has been cancelled
Docker Build & Publish / Prune old image versions (push) Has been cancelled
Docker Build & Publish / build (push) Failing after 10m8s

Implements the §6.2 enrichment pipeline: embedded tags → Chromaprint
fingerprint → AcoustID lookup. Well-tagged files get correct
artist/album/title offline; the rest are identified via AcoustID
(which also yields a MusicBrainz recording id in one call).

- domain: AudioTags/Fingerprint/RecordingMatch value objects; ports
  AudioTagReader, AudioFingerprinter, AcoustIdClient; TrackRepository
  .apply_enrichment (gap-fill, never erases) + AlbumRepository.get_or_create
- infrastructure/metadata: MutagenTagReader, FpcalcFingerprinter,
  AcoustIdHttpClient (rich meta=recordings+releasegroups, throttled)
- application: MetadataEnrichmentService — tags preferred, AcoustID fills
  gaps; resolves artist/album; status enriched/failed; skips manual;
  every external step wrapped (graceful degradation)
- workers: enrich_task registered; enqueue_enrich is best-effort and
  deferred so the caller's txn commits before the worker reads the row
- wiring: upload enqueues after add; import returns imported_ids and
  enqueues post-commit (mid-scan would race the worker); manual
  POST /tracks/{id}/metadata/enrich endpoint
- deps: add mutagen (fpcalc/ffmpeg already in the image)

Tests: metadata service orchestration, AcoustID parser, tag helpers.
125 passed; mypy strict + ruff clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Senko-san
2026-06-09 13:04:02 +03:00
parent 48e3418c7f
commit c72d19599a
24 changed files with 1934 additions and 763 deletions
+16 -6
View File
@@ -9,7 +9,7 @@ must not abort the whole scan (graceful degradation).
import contextlib
import uuid
from dataclasses import dataclass
from dataclasses import dataclass, field
from app.core.logging import get_logger
from app.domain.ports import ArtistRepository, FileStorage, IndexableSource, TrackRepository
@@ -27,6 +27,9 @@ class ImportSummary:
imported: int
skipped: int
failed: int
# IDs of freshly imported tracks, for the caller to enqueue enrichment
# *after* its transaction commits (enqueuing mid-scan would race the worker).
imported_ids: list[uuid.UUID] = field(default_factory=list)
class LibraryImportService:
@@ -44,7 +47,8 @@ class LibraryImportService:
async def scan_and_import(
self, source: IndexableSource, *, added_by: uuid.UUID | None
) -> ImportSummary:
seen = imported = skipped = failed = 0
seen = skipped = failed = 0
imported_ids: list[uuid.UUID] = []
for file in source.scan():
seen += 1
try:
@@ -52,13 +56,18 @@ class LibraryImportService:
if existing is not None:
skipped += 1
continue
await self._import_one(source.name, file, added_by)
imported += 1
track_id = await self._import_one(source.name, file, added_by)
imported_ids.append(track_id)
except Exception:
failed += 1
log.warning("import_file_failed", source=source.name, source_id=file.source_id)
summary = ImportSummary(
source=source.name, seen=seen, imported=imported, skipped=skipped, failed=failed
source=source.name,
seen=seen,
imported=len(imported_ids),
skipped=skipped,
failed=failed,
imported_ids=imported_ids,
)
log.info(
"import_complete",
@@ -72,7 +81,7 @@ class LibraryImportService:
async def _import_one(
self, source_name: str, file: SourceFile, added_by: uuid.UUID | None
) -> None:
) -> uuid.UUID:
track_id = uuid.uuid4()
key = f"tracks/{str(track_id)[:2]}/{track_id}.{file.file_format}"
await self._storage.save_file(key, file.path)
@@ -94,3 +103,4 @@ class LibraryImportService:
with contextlib.suppress(Exception):
await self._storage.delete(key)
raise
return track_id