Introduction
The mission of OpenTaal is to create as good as possible language support for Dutch in (open source or other) software. Besides publishing the source, we trust in existing software components and packages to do the best they can.
However, there appear to be quite some issues to support the Dutch language well.
This page is intended to publish the issues we experience, in the implementation as well as strategic area.
Implementation issues
System spell checking
Advice: use Hunspell for system level
System level spell checking is still very often based on rather primitive spell checkers like Aspell and Ispell. For better language support, switching to Hunspell would enhance spellchecking quite a lot.
Failing software is:
- Almost all distributions.
Character support
Dutch requires the - and ' and ’ to be accepted as part of a word. Otherwise, spell checking is functionally wrong in accepting words like bureau’s as correct. When using Hunspell, the best option to find the special characters to support as part of a word is reading the WORDCHARS clause from Hunspell's affix file.
Failing software:
- Apple Snow Leopard
- Mozilla Firefox (issue scot-free for - ; ' works)
- Mozilla Thunderbird (same as above)
- Opera ( registtred by Opera as DSK-245935)
- OpenOffice.org 3.1 (solved in 3.2)
- Google Chrome (issue 40567)
Warning level in spell checking
Lots of words are correct by itself, but more often seen as an error. Dutch example: kunne (means gender) is often an error for kunnen (to be able to).
An warnng level is needed for these words. (More on this in Strategic).
Failing software: All.
Multi-word spell checking
For Dutch, there are lots of words that are only correct when combined with another word, Example: nota bene. (Otherwise, bene is a typo for benen or been.)
Faling software: All spell checkers and applications.
Hyphenation
Hyphenation is commonly implemented using pattern algorithms. Latest enhancements in the OOo-routines are very promising.But, some words are ambiguous: ballet=je (small ballet) and balle=tje (small ball) e.g. Ambiguous patterns should ideally be presented to the user when the word to hyphenate contains ambiguities.
Failing software: All.
Bugs found and features wanted
Hunspell
Bug: checkcompoundfpattern does not detect flag-flag conflict. CHECKCOMPOUNDPATTERN /A /B should prevent words with flag A to be combined with flag B, but it does not.
Bug: checkcompoundpattern does not work for compounding with more then 2 parts in all compounding methods
Bug: a word forbidden by the flag FORBIDDENWORD sometimes still gets suggested by compounding.
Feature request: Limit the wildness of offered alternatives by setting a max character distance (levenshtein?) and length. e.g.:
MAXDIFF (number)
MAXDIFF(min length) {max length} {max diff)
Feature request: Introduction of the flag for probably wrong (words actually correct, but more likely to be a mistake):
PROBABLEERROR
Though applications are not able to report this, this flag makes it possible to start preparing for it; a feature request to the applicattions will then follow.
By the way, this will also result in an API change.
Feature request: Add a parameter that specifies the flag indicator. Now this is a fixed /, which makes it impossible to support km/u as a word. suggestion: FLAGINDICATOR (flag char)
Feature request: Have a flag on the last compounding part specifying the word has to start with uppercase. (To force words ending with street to be uppercased)
Bug: the 2 compounding mechanisms interfere.
Bug: Keepcase not used in compounds
Bug: option -G reports words which are not input (bad for testing)
Feature request: add word border indicator to REP
Bug: REP with >1 _ fails
Bug: REP with non-letters in replacement fails
Feature request: Spellchecking words having first char uppercased
Mozilla (Firefox, Thunderbird
- Shows only 5 spellcheck options, which is too short; reported
Opera
-
Is not able to do compounding, probably due to the older Hunspell code incorporated (in investigation as CORE-28935 by Opera)
OpenOffice.org
- Feature request: after spellchecking a word, re-apply the auto-improvement of the apostrophe
Google Chrome
- Complete Hunspell support (40695)
Strategic issues
As shown by the above implementation issues, there is something functionally wrong in language support.
Spellchecking (Hunspell and others) does only one word at a time, and does no warnings. Of course, Grammar checking fills that hole, but is unfortnately not widely accepted as a plug-in. Hyphenation is another loosely tied program.
OpenTaal thinks we need a better approach.
What we would like
We think using an interface like the one built between OOo and grammar checkers is a good thing. We think that interface should be made a bit more generally applicable, resulting in a language support interface module for any applicaion to implement freely.
This module allows several plug-ins per locale that all do their own job, and add markings to the received text and improvement suggestions with a request for a certain unerlinement color.
This way, the single word spell checker could signal erroneous words red, probably erroneous words with orange, while the grammar checker reports its suggestions in blue, or different colors for different levels of severity (error, warning, info). Even the synonym function could signal synonym availability and offer suggestions.
Hyphenation would just offer the hyphenation options for the words.
This scheme would allow for different plug-ins using different programming languages, all contributing differently, but presenting text improvement suggestions in a standardised way to the applications.
voeg deze pagina toe aan je favoriete socail network




