Commit Graph

15 Commits

Author SHA1 Message Date
Eugen Rochko e6cfa7ab89
Change language detector threshold from 140 characters to 4 words (#10376)
Add `lang` attribute to statuses in web UI
2019-03-26 01:23:59 +01:00
Eugen Rochko 1b167707c2
Fix language detection of non-latin alphabets even at few characters (#10276) 2019-03-15 05:07:09 +01:00
Jeong Arm 144d73730d Leave unknown language as nil if account is remote (#8861)
* Force use language detector if account is remote

* Set unknown remote toot's language as nil
2018-10-05 19:17:46 +02:00
Eugen Rochko 38e9662d78
Disable language detection for texts shorter than 140 characters (#8010)
If the input text is blank after preparation (only mention, or
only URL, or empty as in a media post), then use nil as language,
since it's OK to show to everyone.

Otherwise, always fall back to the server's default locale
2018-07-14 04:05:36 +02:00
Renato "Lond" Cerqueira ad207456d6 Improve language filter (#5724)
* Scrub text of html before detecting language.

* Detect language on statuses coming from activitypub.

* Fix rubocop comments.

* Remove custom emoji from text before language detection
2017-11-16 13:51:38 +01:00
Akihiko Odaki 48d77ea1eb Fix filterable_languages method of SettingsHelper (#4966) 2017-09-16 14:59:41 +02:00
Eugen Rochko 1caf11ddcc Fix language filter codes (#4841)
* Fix language filter codes

CLD3 returns BCP-47 language identifier, filter settings expect
identifiers in the ISO 639-1 format. Convert between formats,
and exclude duplicate languages from filter choices (zh-CN->zh)

* Fix zh name
2017-09-08 12:32:22 +02:00
Eugen Rochko e1fcad34a9 Fix length validator counting things that look like URIs like URLs (#4462)
URI.extract is too strong, not limited to URLs, matched real text.
Same issue was present in language detector.
2017-07-31 05:06:20 +02:00
Matt Jankowski 022008a2a6 Language detection defaults to nil (#3666)
* Default to nil for statuses.language

* Language detection defaults to nil instead of instance UI default
2017-06-09 18:09:37 +02:00
Matt Jankowski d010e270e6 Remove usernames and hashtags from language detection (#3503)
* Add failing specs for hashtag and username extraction in language detector

* Remove usernames and hashtags from text before language detection

* Handle multiple instances of special case, and reduce whitespace
2017-06-01 09:29:14 -04:00