feat(enrichment): tag-first metadata pipeline (§1D)

Implements the §6.2 enrichment pipeline: embedded tags → Chromaprint fingerprint → AcoustID lookup. Well-tagged files get correct artist/album/title offline; the rest are identified via AcoustID (which also yields a MusicBrainz recording id in one call). - domain: AudioTags/Fingerprint/RecordingMatch value objects; ports AudioTagReader, AudioFingerprinter, AcoustIdClient; TrackRepository .apply_enrichment (gap-fill, never erases) + AlbumRepository.get_or_create - infrastructure/metadata: MutagenTagReader, FpcalcFingerprinter, AcoustIdHttpClient (rich meta=recordings+releasegroups, throttled) - application: MetadataEnrichmentService — tags preferred, AcoustID fills gaps; resolves artist/album; status enriched/failed; skips manual; every external step wrapped (graceful degradation) - workers: enrich_task registered; enqueue_enrich is best-effort and deferred so the caller's txn commits before the worker reads the row - wiring: upload enqueues after add; import returns imported_ids and enqueues post-commit (mid-scan would race the worker); manual POST /tracks/{id}/metadata/enrich endpoint - deps: add mutagen (fpcalc/ffmpeg already in the image) Tests: metadata service orchestration, AcoustID parser, tag helpers. 125 passed; mypy strict + ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 13:04:02 +03:00
parent 48e3418c7f
commit c72d19599a
24 changed files with 1934 additions and 763 deletions
@@ -173,3 +173,47 @@ class SqlAlchemyTrackRepository:
        await self._session.flush()
        await self._session.refresh(row)
        return _to_entity(row)
+
+    async def apply_enrichment(
+        self,
+        track_id: uuid.UUID,
+        *,
+        title: str,
+        artist_id: uuid.UUID,
+        album_id: uuid.UUID | None,
+        genre: str | None,
+        year: int | None,
+        track_number: int | None,
+        duration_seconds: int | None,
+        bitrate: int | None,
+        acoustid_fingerprint: str | None,
+        musicbrainz_id: str | None,
+        metadata_status: str,
+    ) -> Track:
+        row = await self._session.get(TrackModel, track_id)
+        if row is None:
+            raise NotFoundError(f"Track {track_id} not found.")
+        # Identity + status are authoritative for an enrichment run.
+        row.title = title
+        row.artist_id = artist_id
+        row.metadata_status = metadata_status
+        # Nullable extras: fill gaps only — never erase data a prior run found.
+        if album_id is not None:
+            row.album_id = album_id
+        if genre is not None:
+            row.genre = genre
+        if year is not None:
+            row.year = year
+        if track_number is not None:
+            row.track_number = track_number
+        if duration_seconds is not None:
+            row.duration_seconds = duration_seconds
+        if bitrate is not None:
+            row.bitrate = bitrate
+        if acoustid_fingerprint is not None:
+            row.acoustid_fingerprint = acoustid_fingerprint
+        if musicbrainz_id is not None:
+            row.musicbrainz_id = musicbrainz_id
+        await self._session.flush()
+        await self._session.refresh(row)
+        return _to_entity(row)