ReadSight is a PHP library for measuring text readability across 86 languages. It implements 17 readability formulas with language-specific coefficients and uses the Frank M. Liang (TeX) hyphenation algorithm for accurate syllable counting — all with zero runtime dependencies.
Two texts of almost equal length — a plain sentence and a chunk of legal boilerplate:
$plain = 'We made an app that reads your text. It tells you how easy it is to read. You get a score in one second.';
$legal = 'The parties acknowledge that any unauthorized disclosure of confidential information may cause irreparable harm. In such an event, the affected party shall be entitled to seek injunctive relief.';There is no "score everything" call — you loop over the formulas the language supports and call score() for each:
use GlobusStudio\ReadSight\Engine;
$rs = new Engine('en-us');
foreach ($rs->getSupportedFormulas() as $formula) {
$result = $rs->score($formula, $legal);
// $result->score, $result->gradeLevel, $result->interpretation
// ...
}For both texts that produces:
+-----------------------+-------------------------+----------------------------+
| READABILITY FORMULA | Plain text | Legalese |
+-----------------------+-------------------------+----------------------------+
| Flesch Reading Ease | 107.1 Very Easy | 23.4 Very Hard |
| Flesch-Kincaid Grade | 0.3 g0.3 1st Grade | 13.5 g13.5 College |
| Gunning Fog | 3.2 g3.2 Very Easy | 18.5 g18.5 Extremely Hard |
| SMOG Index | 3.1 g3.1 3rd Grade | 15.2 g15.2 College |
| Coleman-Liau | -0.4 g0.0 Kindergarten | 16.5 g16.5 Graduate |
| Automated Readability | -2.1 g0.0 Kindergarten | 13.2 g13.2 College |
| LIX | 8.0 Children's Books | 49.7 Factual Information |
| Dale-Chall | 5.3 5th-6th grade | 12.2 Graduate |
| Spache | 2.3 g2.3 2nd Grade | 6.5 g5.0 Above 4th Grade |
+-----------------------+-------------------------+----------------------------+
All 9 formulas for en-us agree the second text is far harder. The bundled example prints this grid plus text metrics and a syllable histogram, for any text and language:
php examples/dashboard.php
php examples/dashboard.php --lang=de-1996 --file=essay.txt17 formulas, 86 languages, one consistent API. Five of the formulas are truly universal — Gunning Fog, SMOG, Coleman-Liau, ARI and LIX score text in every one of the 86 languages. The remaining 12 are language-aware, each carrying its own published coefficients: Flesch Reading Ease and Flesch-Kincaid span 12 languages, the Wiener Sachtextformel speaks German, Gulpease speaks Italian, OSMAN speaks Arabic, and the Fernández-Huerta · Szigriszt-Pazos · Gutiérrez-Polini · Crawford family handles Spanish. getSupportedFormulas() then hands each language exactly the slice that fits it — 9 formulas for en-us, 11 for es, 8 for de-1996 — so an English-only metric never lands on a Thai sentence by mistake.
- Installation
- See It in Action
- Quick Start
- Syllable Counting Modes
- Demo
- Supported Languages
- Readability Formulas
- FormulaResult
- Performance
- Custom Configuration
- Architecture
- Data Sources
- Development
- License
composer require globus-studio/readsightRequirements:
- PHP >= 8.2
ext-mbstringext-json
No other runtime dependencies.
use GlobusStudio\ReadSight\Engine;
$engine = new Engine('en-us');
// Syllable counting
$engine->syllableCount('banana'); // 3
$engine->splitSyllables('hyphenation'); // ['hyp', 'hen', 'ati', 'on'] (4 syllables, heuristic split)
$engine->splitWord('hyphenation'); // ['hy', 'phen', 'ation'] (TeX hyphenation points)
// Text analysis
$stats = $engine->analyze('The quick brown fox jumps over the lazy dog.');
echo "Words: {$stats->wordCount}, Syllables: {$stats->syllableCount}\n";
// Readability formulas
$fre = $engine->fleschReadingEase($text);
echo "Flesch Reading Ease: {$fre->score} - {$fre->interpretation}\n";
$fog = $engine->gunningFog($text);
echo "Gunning Fog: {$fog->score} (grade {$fog->gradeLevel})\n";
$lix = $engine->lix($text);
echo "LIX: {$lix->score} - {$lix->interpretation}\n";ReadSight has three syllable counting modes, configured per language via syllableMode in data/languages/*.json:
| Mode | How it works | count accuracy |
split accuracy |
|---|---|---|---|
heuristic |
Vowel patterns + word list + prefix/suffix rules | ✓ | ≈ approximate |
tex |
Frank M. Liang hyphenation algorithm (TeX .tex patterns) |
✓ | ✓ exact |
composite |
Heuristic first, TeX as fallback | ✓ | ≈ approximate (uses heuristic split) |
The default mode is tex. 84 languages use tex; 2 use composite (en-us, en-gb).
$engine = new Engine('en-us'); // composite mode - heuristic wins
$engine->syllableCount('hyphenation'); // 4 ✓ (in problemWords list)
$engine->splitSyllables('hyphenation'); // ['hyp', 'hen', 'ati', 'on'] - heuristic: equal-width split, ≈ approximate
$engine->splitWord('hyphenation'); // ['hy', 'phen', 'ation'] - TeX hyphenator: exact points
$engine = new Engine('de-1996'); // tex mode
$engine->syllableCount('hyphenation'); // 4 ✓ (TeX patterns)
$engine->splitSyllables('hyphenation'); // ['hy', 'phena', 'ti', 'on'] - TeX: exact
$engine->splitWord('hyphenation'); // ['hy', 'phena', 'ti', 'on'] - same, both use TeXTip:
splitWord()always uses the TeX hyphenator (exact).splitSyllables()may use the heuristic split (approximate) incomposite/heuristicmodes. For syllable counts both are correct.
Note:
addHyphenations()adds overrides to the TeX hyphenator. These affectsplitWord()but NOTsplitSyllables()incomposite/heuristicmodes (the heuristic counter doesn't see them).
Run the interactive demo to see ReadSight in action:
php examples/demo.phpThis analyzes built-in sample text and outputs:
- Syllable breakdown with hyphenation points for common words
- Text statistics - letters, words, sentences, syllables, histogram
- All applicable readability formulas with scores and interpretations
Compare the same text across 8 languages:
php examples/demo.php --compareAnalyze your own text file:
php examples/demo.php --file=essay.txt
php examples/demo.php --file=essay.txt --lang=de-199686 languages across 19 writing systems: Latin, Cyrillic, Arabic, Hebrew, Devanagari, Bengali, Tamil, Thai, Greek, Armenian, Georgian, Gujarati, Gurmukhi, Kannada, Malayalam, Odia, Telugu, Ethiopic, Coptic.
$engine = new Engine('ru'); // Russian
$engine = new Engine('de-1996'); // German (1996 reform)
$engine = new Engine('es'); // Spanish
$engine = new Engine('th'); // Thai
// List all supported languages
$langs = Engine::getSupportedLanguages();
# ['af', 'ar', 'as', 'be', 'bg', 'bn', 'ca', 'cop', 'cs', 'cu', 'cy', 'da',
# 'de-1901', 'de-1996', 'de-ch-1901', 'el-monoton', 'el-polyton', 'en-gb',
# 'en-us', 'eo', 'es', 'et', 'eu', 'fa', 'fi', 'fi-x-school', 'fr', 'fur',
# 'ga', 'gl', 'grc', 'gu', 'he', 'hi', 'hr', 'hsb', 'hu', 'hy', 'ia', 'id',
# 'is', 'it', 'ka', 'kk', 'kmr', 'kn', 'la', 'la-x-classic', 'la-x-liturgic',
# 'lt', 'lv', 'mk', 'ml', 'mn-cyrl', 'mn-cyrl-x-lmc', 'mr', 'mul-ethi', 'nb',
# 'nl', 'nn', 'oc', 'or', 'pa', 'pi', 'pl', 'pms', 'pt', 'rm', 'ro', 'ru',
# 'sa', 'sh-cyrl', 'sh-latn', 'sk', 'sl', 'sq', 'sr-cyrl', 'sv', 'ta', 'te',
# 'th', 'tk', 'tr', 'uk', 'vi', 'zh-latn-pinyin']| Formula | Method | Type | Score Range |
|---|---|---|---|
| Gunning Fog | gunningFog() |
Syllable-based | 0–20+ |
| SMOG Index | smogIndex() |
Syllable-based | 3–18+ |
| Coleman-Liau | colemanLiau() |
Letter-based | 0–18+ |
| ARI | automatedReadabilityIndex() |
Letter-based | 0–18+ |
| LIX | lix() |
Letter-based | 20–60+ |
| Language | Formulas |
|---|---|
English (en-us, en-gb) |
Flesch Reading Ease, FK Grade Level, Dale-Chall*, Spache* |
German (de-*) |
Flesch Reading Ease (Amstad), FKGL, Wiener Sachtextformel (4 variants) |
Russian (ru) |
Flesch Reading Ease (Oborneva), FKGL |
Spanish (es) |
Flesch Reading Ease, Fernandez-Huerta, Szigriszt-Pazos, Gutierrez-Polini, Crawford |
Italian (it) |
Flesch Reading Ease, Gulpease |
French (fr) |
Flesch Reading Ease (Kandel-Moles) |
Dutch (nl) |
Flesch Reading Ease (Douma) |
Portuguese (pt) |
Flesch Reading Ease (Martins) |
Turkish (tr) |
Flesch Reading Ease (Ateşman) |
Polish (pl) |
FOG-PL |
Arabic (ar) |
OSMAN |
* Note: Dale-Chall and Spache formulas use a syllable-based heuristic to estimate difficult words (1-syllable ≈ easy). This is a simplified estimation, not based on the original Dale/Spache word lists. For accurate Dale-Chall/Spache scores, a curated word list would be required.
Generic dispatching:
$result = $engine->score('gunning_fog', $text);
$result = $engine->score('wiener_sachtextformel', $text);$result->score; // float - raw formula score
$result->gradeLevel; // ?float - normalized grade level (FKGL, GF, SMOG, CL, ARI)
$result->interpretation; // string - qualitative interpretation ("Easy", "Hard")
$result->formulaName; // string - formula key
$result->languageCode; // string - language code used
$result->inputs; // array<string, float|int> - intermediate values for debugging$engine->syllableCount(string $word): int
$engine->splitWord(string $word): list<string>
$engine->splitSyllables(string $word): list<string>
$engine->wordCount(string $text): int
$engine->sentenceCount(string $text): int
$engine->letterCount(string $text): int
$engine->totalSyllables(string $text): int
$engine->averageSyllablesPerWord(string $text): float
$engine->averageWordsPerSentence(string $text): float
$engine->polysyllableCount(string $text, bool $countProperNouns = true): int
$engine->wordsWithMoreThanNSyllables(string $text, int $n, bool $countProperNouns = true): int
$engine->histogramSyllables(string $text): array<int, int>
$engine->analyze(string $text): TextStatistics
splitSyllablesvssplitWord:splitSyllablesmay use the heuristic ≈approximate split (depends on the language'ssyllableMode).splitWordalways uses the TeX hyphenator for exact hyphenation points. Syllable counts are accurate in all modes. See Syllable Counting Modes.
$engine->fleschReadingEase(string $text): FormulaResult
$engine->fleschKincaidGradeLevel(string $text): FormulaResult
$engine->gunningFog(string $text): FormulaResult
$engine->smogIndex(string $text): FormulaResult
$engine->colemanLiau(string $text): FormulaResult
$engine->automatedReadabilityIndex(string $text): FormulaResult
$engine->lix(string $text): FormulaResult
$engine->wienerSachtextformel(string $text, int $variant = 1): FormulaResult
$engine->gulpease(string $text): FormulaResult
$engine->fernandezHuerta(string $text): FormulaResult
$engine->szigrisztPazos(string $text): FormulaResult
$engine->gutierrezPolini(string $text): FormulaResult
$engine->crawford(string $text): FormulaResult
$engine->fogPL(string $text): FormulaResult
$engine->daleChall(string $text): FormulaResult
$engine->spache(string $text): FormulaResult
$engine->osman(string $text): FormulaResult| Operation | Time |
|---|---|
| Syllable counting (single word) | ~0.15 ms |
| Text analysis (450 words) | ~20 ms |
| Formula calculation (incl. analysis) | ~4 ms |
| Engine init (en-us, cached) | ~5 ms |
| Engine init (de-1996, first load) | ~380 ms |
Caching: compiled patterns are stored as JSON in the cache/ directory.
First load parses .tex files (native hyph-utf8 format); subsequent loads use the pre-compiled cache.
use GlobusStudio\ReadSight\Engine;
// Set default paths (before creating engines)
Engine::setDefaultCacheDir('/var/cache/readsight');
Engine::setDefaultPatternsDir('/usr/share/readsight/patterns');
Engine::setDefaultLanguagesDir('/usr/share/readsight/languages');
// Or per-instance
$engine = new Engine(
language: 'en-us',
patternsDir: '/custom/patterns',
cacheDir: '/custom/cache',
);
// Add custom hyphenation rules (affects splitWord, not splitSyllables in composite/heuristic modes)
$engine->addHyphenations([
'customword' => 'cus-tom-word',
]);
$engine->splitWord('customword'); // ['cus', 'tom', 'word']Engine (facade)
├── TextAnalyzer (syllable counting, text metrics)
│ ├── SyllableCounter (strategy: tex | heuristic | composite)
│ │ ├── CompositeSyllableCounter (problemWords → heuristic, rest → TeX)
│ │ ├── HeuristicSyllableCounter (vowel patterns + word list)
│ │ └── TexSyllableCounter → LiangHyphenator (TeX hyphenation)
│ ├── LiangHyphenator
│ │ ├── TexSource (parses .tex from hyph-utf8)
│ │ ├── PatternsCollection (pattern data)
│ │ ├── HyphenationExceptionsCollection (word overrides)
│ │ └── JsonPatternCache (compiled patterns)
│ └── TextSplitter (word/sentence/letter counting)
├── Language (JSON config per language, syllableMode + formulaConfigs)
└── FormulaRegistry (17 formulas)
├── FleschReadingEase (with lang-specific coefficients)
├── GunningFog, SMOG, ColemanLiau, ARI, LIX (universal)
└── WSTF, Gulpease, Fernandez-Huerta, etc. (lang-specific)
- TeX hyphenation patterns: hyph-utf8 version 2026-02-21 -
the canonical TeX hyphenation repository maintained by the TeX Users Group (TUG).
86
.texpattern files from hyph-utf8 covering 86 language variants. Packaged under each pattern file's original license. - FRE coefficients: Amstad (DE), Oborneva (RU), Fernandez-Huerta (ES), Vacca-Franchina (IT), Kandel-Moles (FR), Douma (NL), Martins (PT), Ateşman (TR)
- WSTF: Bamberger & Vanecek (DE)
- Gulpease: GULP, La Sapienza University (IT)
composer install # Install dependencies
composer test # Run PHPUnit (257 tests)
composer test:coverage # With HTML coverage report
composer analyse # PHPStan level max
composer cs:check # PHP CS Fixer (dry-run)
composer cs:fix # PHP CS Fixer (apply fixes)
composer check # All checks: CS + PHPStan + Tests| Metric | Value | |---|---|---| | Tests | 257 | | Assertions | 1 047 | | PHPStan | Level max, 0 errors | | Source classes | 53 | | Test classes | 21 | | Supported languages | 86 | | Writing systems | 19 | | Readability formulas | 17 | | Runtime dependencies | 0 |
MIT. Author: Yevhen Leonidov.
TeX pattern files from hyph-utf8 are packaged under their original licenses (see individual file headers).