fix: skip invalid episode_ids in semantic ingestion instead of crashing#1128
fix: skip invalid episode_ids in semantic ingestion instead of crashing#1128haosenwang1018 wants to merge 2 commits intoMemMachine:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses issue #1102 by changing the semantic ingestion process to gracefully skip invalid episode_ids instead of crashing with a ValueError. The change replaces raising an exception with logging a warning when episode_ids referenced in semantic storage have been deleted or are otherwise invalid.
Changes:
- Replace ValueError exception with warning log when invalid episode_ids are encountered
- Filter out None messages before processing to allow valid messages to continue
Comments suppressed due to low confidence (1)
src/memmachine/semantic_memory/semantic_ingestion.py:117
- When all messages are invalid (all episode_ids return None), the filtered messages list will be empty, and processing will skip marking the invalid episode_ids as ingested. This creates an infinite retry loop. Consider adding an early return after marking invalid episode_ids as ingested: if len(raw_messages) == 0: return
raw_messages = [m for m in raw_messages if m is not None]
messages = TypeAdapter(list[Episode]).validate_python(raw_messages)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| set_id, | ||
| none_h_ids, | ||
| ) | ||
|
|
There was a problem hiding this comment.
Invalid episode_ids are not being marked as ingested, which will cause them to be retried indefinitely. According to the stored memory (src/memmachine/semantic_memory/semantic_ingestion.py lines 109-120), invalid episode_ids should be marked as ingested to prevent repeated failures. After logging the warning, you should mark these invalid episode_ids as ingested by calling: await self._semantic_storage.mark_messages_ingested(set_id=set_id, history_ids=none_h_ids)
| await self._semantic_storage.mark_messages_ingested( | |
| set_id=set_id, | |
| history_ids=none_h_ids, | |
| ) |
| set_id, | ||
| none_h_ids, | ||
| ) | ||
|
|
There was a problem hiding this comment.
This change introduces new behavior (skipping invalid episode_ids) but lacks test coverage. Consider adding a test case that verifies: 1) invalid episode_ids are logged with a warning, 2) invalid episode_ids are marked as ingested to prevent reprocessing, 3) valid messages are still processed correctly when some episode_ids are invalid, and 4) the function handles gracefully when all episode_ids are invalid.
| await self._semantic_storage.mark_messages_ingested( | |
| set_id=set_id, | |
| history_ids=none_h_ids, | |
| ) |
…inite retry Signed-off-by: haosenwang1018 <haosenwang1018@users.noreply.github.com>
Closes #1102
When episode_ids referenced in semantic storage have been deleted or are otherwise invalid, the ingestion process crashes with a
ValueError. This change gracefully skips invalid episode_ids with a warning log instead of failing entirely, allowing valid messages to still be processed.As suggested in the issue discussion.