Detect the language of text.
- franc can support more languages(†) than any other library
- franc is packaged with support for 82, 188, or 402 languages
- franc has a CLI
† - Based on the UDHR, the most translated document in the world.
franc supports many languages, so make sure to pass it big documents, to get reliable results.
npm:
npm install franc
This installs the franc
package, with support for 188 languages
(languages which have 1 million or more speakers). franc-min
(82
languages, 8m or more speakers) and franc-all
(all 402 possible
languages) are also available. Finally, use franc-cli
to install the
CLI.
Browser builds for franc-min
, franc
, and franc-all
are
available on GitHub Releases.
var franc = require('franc')
franc('Alle menslike wesens word vry') // => 'afr'
franc('এটি একটি ভাষা একক IBM স্ক্রিপ্ট') // => 'ben'
franc('Alle menneske er fødde til fridom') // => 'nno'
franc('') // => 'und'
franc('the') // => 'und'
/* You can change what’s too short (default: 10): */
franc('the', {minLength: 3}) // => 'sco'
console.log(franc.all('O Brasil caiu 26 posições'))
Yields:
[ [ 'por', 1 ],
[ 'src', 0.8797557538750587 ],
[ 'glg', 0.8708313762329732 ],
[ 'snn', 0.8633161108501644 ],
[ 'bos', 0.8172851103804604 ],
... 116 more items ]
console.log(franc.all('O Brasil caiu 26 posições', {whitelist: ['por', 'spa']}))
Yields:
[ [ 'por', 1 ], [ 'spa', 0.799906059182715 ] ]
console.log(franc.all('O Brasil caiu 26 posições', {blacklist: ['src', 'glg']}))
Yields:
[ [ 'por', 1 ],
[ 'snn', 0.8633161108501644 ],
[ 'bos', 0.8172851103804604 ],
[ 'hrv', 0.8107092531705026 ],
[ 'lav', 0.810239549084077 ],
... 114 more items ]
Install:
npm install franc-cli --global
Use:
CLI to detect the language of text
Usage: franc [options] <string>
Options:
-h, --help output usage information
-v, --version output version number
-m, --min-length <number> minimum length to accept
-w, --whitelist <string> allow languages
-b, --blacklist <string> disallow languages
-a, --all display all guesses
Usage:
# output language
$ franc "Alle menslike wesens word vry"
# afr
# output language from stdin (expects utf8)
$ echo "এটি একটি ভাষা একক IBM স্ক্রিপ্ট" | franc
# ben
# blacklist certain languages
$ franc --blacklist por,glg "O Brasil caiu 26 posições"
# src
# output language from stdin with whitelist
$ echo "Alle mennesker er født frie og" | franc --whitelist nob,dan
# nob
Package | Languages | Speakers |
---|---|---|
franc-min |
82 | 8M or more |
franc |
188 | 1M or more |
franc-all |
402 | - |
Note that franc returns ISO 639-3 codes (three letter codes). Not ISO 639-1 or ISO 639-2. See also GH-10 and GH-30.
Franc has been ported to several other programming languages.
- Elixir —
paasaa
- Erlang —
efranc
- Go —
franco
,whatlanggo
- R —
franc
- Rust —
whatlang-rs
The works franc is derived from have themselves also been ported to other languages.
Franc is a derivative work from guess-language (Python, LGPL), guesslanguage (C++, LGPL), and Language::Guess (Perl, GPL). Their creators granted me the rights to distribute franc under the MIT license: respectively, Kent S. Johnson, Jacob R. Rideout, and Maciej Ceglowski.