Port 'Adjust placement of paragraph markers' from machine.py#435
Port 'Adjust placement of paragraph markers' from machine.py#435Copilot wants to merge 3 commits into
Conversation
|
@copilot The build is failing. |
Fixed in the latest commit — the two new files ( |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #435 +/- ##
==========================================
+ Coverage 73.18% 73.28% +0.10%
==========================================
Files 440 441 +1
Lines 36882 37055 +173
Branches 5075 5094 +19
==========================================
+ Hits 26991 27155 +164
- Misses 8778 8781 +3
- Partials 1113 1119 +6 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Enkidu93
left a comment
There was a problem hiding this comment.
@Enkidu93 reviewed 4 files and all commit messages, and made 1 comment.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on ddaspit).
Ports machine.py#298 — after alignment-based placement of paragraph markers, apply small boundary adjustments to produce more natural splits (e.g. keeping a trailing comma with its sentence rather than letting it open the next paragraph).
New:
SegmentBoundaryAdjusterTwo new classes in
SegmentBoundaryAdjuster.cs:TokenRejoiner— reconstructs token lists into strings with correct punctuation spacing (no space before,/./closing quotes, no space after opening brackets/quotes).SegmentBoundaryAdjuster— adjusts a segment boundary by:,;.?!closing quotes/brackets) from the head of the next segment to the tail of the current oneAdjustTokenizedSegmentPairBoundaries(int boundary, IReadOnlyList<string> tokens)— token-index variant used by the handlerChange:
PlaceMarkersUsfmUpdateBlockHandlerAfter
PredictMarkerLocation, paragraph markers now go throughAdjustTokenizedSegmentPairBoundariesbefore their string index is resolved:Before: alignment places
\pbefore,→ paragraph opens with, y esta prueba…After: comma stays in the preceding paragraph →
Este texto está en inglés,/\p y esta prueba…This change is