feat(enrichment): tag-first metadata pipeline (§1D)

Implements the §6.2 enrichment pipeline: embedded tags → Chromaprint fingerprint → AcoustID lookup. Well-tagged files get correct artist/album/title offline; the rest are identified via AcoustID (which also yields a MusicBrainz recording id in one call). - domain: AudioTags/Fingerprint/RecordingMatch value objects; ports AudioTagReader, AudioFingerprinter, AcoustIdClient; TrackRepository .apply_enrichment (gap-fill, never erases) + AlbumRepository.get_or_create - infrastructure/metadata: MutagenTagReader, FpcalcFingerprinter, AcoustIdHttpClient (rich meta=recordings+releasegroups, throttled) - application: MetadataEnrichmentService — tags preferred, AcoustID fills gaps; resolves artist/album; status enriched/failed; skips manual; every external step wrapped (graceful degradation) - workers: enrich_task registered; enqueue_enrich is best-effort and deferred so the caller's txn commits before the worker reads the row - wiring: upload enqueues after add; import returns imported_ids and enqueues post-commit (mid-scan would race the worker); manual POST /tracks/{id}/metadata/enrich endpoint - deps: add mutagen (fpcalc/ffmpeg already in the image) Tests: metadata service orchestration, AcoustID parser, tag helpers. 125 passed; mypy strict + ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 13:04:02 +03:00
parent 48e3418c7f
commit c72d19599a
24 changed files with 1934 additions and 763 deletions
@@ -9,7 +9,7 @@ must not abort the whole scan (graceful degradation).

 import contextlib
 import uuid
-from dataclasses import dataclass
+from dataclasses import dataclass, field

 from app.core.logging import get_logger
 from app.domain.ports import ArtistRepository, FileStorage, IndexableSource, TrackRepository
@@ -27,6 +27,9 @@ class ImportSummary:
    imported: int
    skipped: int
    failed: int
+    # IDs of freshly imported tracks, for the caller to enqueue enrichment
+    # *after* its transaction commits (enqueuing mid-scan would race the worker).
+    imported_ids: list[uuid.UUID] = field(default_factory=list)


 class LibraryImportService:
@@ -44,7 +47,8 @@ class LibraryImportService:
    async def scan_and_import(
        self, source: IndexableSource, *, added_by: uuid.UUID | None
    ) -> ImportSummary:
-        seen = imported = skipped = failed = 0
+        seen = skipped = failed = 0
+        imported_ids: list[uuid.UUID] = []
        for file in source.scan():
            seen += 1
            try:
@@ -52,13 +56,18 @@ class LibraryImportService:
                if existing is not None:
                    skipped += 1
                    continue
-                await self._import_one(source.name, file, added_by)
-                imported += 1
+                track_id = await self._import_one(source.name, file, added_by)
+                imported_ids.append(track_id)
            except Exception:
                failed += 1
                log.warning("import_file_failed", source=source.name, source_id=file.source_id)
        summary = ImportSummary(
-            source=source.name, seen=seen, imported=imported, skipped=skipped, failed=failed
+            source=source.name,
+            seen=seen,
+            imported=len(imported_ids),
+            skipped=skipped,
+            failed=failed,
+            imported_ids=imported_ids,
        )
        log.info(
            "import_complete",
@@ -72,7 +81,7 @@ class LibraryImportService:

    async def _import_one(
        self, source_name: str, file: SourceFile, added_by: uuid.UUID | None
-    ) -> None:
+    ) -> uuid.UUID:
        track_id = uuid.uuid4()
        key = f"tracks/{str(track_id)[:2]}/{track_id}.{file.file_format}"
        await self._storage.save_file(key, file.path)
@@ -94,3 +103,4 @@ class LibraryImportService:
            with contextlib.suppress(Exception):
                await self._storage.delete(key)
            raise
+        return track_id