Opentaal.org

  • Vergroot lettergrootte
  • Standaard lettergrootte
  • Verklein lettergrootte
Home English Software Issues

Dutch language support issues

Introduction

The mission of OpenTaal is to create as good as possible language support for Dutch in (open source or other) software. Besides publishing the source, we trust in existing software components and packages to do the best they can.

However, there appear to be quite some issues to support the Dutch language well.

This page is intended to publish the issues we experience, in the implementation as well as strategic area.

Implementation issues

System spell checking

Advice: use Hunspell for system level

System level spell checking is still very often based on rather primitive spell checkers like Aspell and Ispell. For better language support, switching to Hunspell would enhance spellchecking quite a lot.

Failing software is:

  • Almost all distributions.

 

Character support

Dutch requires the - and ' and ’ to be accepted as part of a word. Otherwise, spell checking is functionally wrong in accepting words like bureau’s as correct. When using Hunspell, the best option to find the special characters to support as part of a word is reading the WORDCHARS clause from Hunspell's affix file.

Failing software:

  • Apple Snow Leopard
  • Mozilla Firefox (issue scot-free for -  ; ' works)
  • Mozilla Thunderbird (same as above)
  • Opera ( registtred by Opera as DSK-245935)
  • OpenOffice.org 3.1 (solved in 3.2)
  • Google Chrome (issue 40567)

Warning level in spell checking

Lots of words are correct by itself, but more often seen as an error. Dutch example: kunne (means gender) is often an error for kunnen (to be able to).

An warnng level is needed for these words. (More on this in Strategic).

Failing software: All.

Multi-word spell checking

For Dutch, there are lots of words that are only correct when combined with another word, Example: nota bene. (Otherwise, bene is a typo for benen or been.)

Faling software: All spell checkers and applications.

Hyphenation

Hyphenation is commonly implemented using pattern algorithms. Latest enhancements in the OOo-routines are very promising.But, some words are ambiguous: ballet=je (small ballet) and balle=tje (small ball) e.g. Ambiguous patterns should ideally be presented to the user when the word to hyphenate contains ambiguities.

Failing software: All.

 

Bugs found and features wanted

Hunspell

Bug: checkcompoundfpattern does not detect flag-flag conflict. CHECKCOMPOUNDPATTERN /A /B should prevent words with flag A to be combined with flag B, but it does not.

Bug: checkcompoundpattern does not work for compounding with more then 2 parts in all compounding methods

Bug: a word forbidden by the flag FORBIDDENWORD sometimes still gets suggested by compounding.

Feature request: Limit the wildness of offered alternatives by setting a max character distance (levenshtein?) and length. e.g.:

MAXDIFF (number)

MAXDIFF(min length) {max length} {max diff)

Feature request: Introduction of the flag for probably wrong (words actually correct, but more likely to be a mistake):

PROBABLEERROR

Though applications are not able to report this, this flag makes it possible to start preparing for it; a feature request to the applicattions will then follow.

By the way, this will also result in an API change.

Feature request: Add a parameter that specifies the flag indicator. Now this is a fixed /, which makes it impossible to support km/u as a word. suggestion: FLAGINDICATOR (flag char)

Feature request: Have a flag on the last compounding part specifying the word has to start with uppercase. (To force words ending with street to be uppercased)

Bug: the 2 compounding mechanisms interfere.

Bug: Keepcase not used in compounds

Bug: option -G reports words which are not input (bad for testing)

Feature request: add word border indicator to REP

Bug: REP with >1 _ fails

Bug: REP with non-letters in replacement fails

Feature request: Spellchecking words having first char uppercased

 

Mozilla (Firefox, Thunderbird

  • Shows only 5 spellcheck options, which is too short; reported

Opera

  • Is not able to do compounding, probably due to the older Hunspell code incorporated (in investigation as CORE-28935 by Opera)

OpenOffice.org

  • Feature request: after spellchecking a word, re-apply the auto-improvement of the apostrophe

Google Chrome

  • Complete Hunspell support (40695)

Strategic issues

As shown by the above implementation issues, there is something functionally wrong in language support.

Spellchecking (Hunspell and others) does only one word at a time, and does no warnings. Of course, Grammar checking fills that hole, but is unfortnately not widely accepted as a plug-in. Hyphenation is another loosely tied program.

OpenTaal thinks we need a better approach.

What we would like

We think using an interface like the one built between OOo and grammar checkers is a good thing. We think that interface should be made a bit more generally applicable, resulting in a language support interface module for any applicaion to implement freely.

This module allows several plug-ins per locale that all do their own job, and add markings to the received text and improvement suggestions with a request for a certain unerlinement color.

This way, the single word spell checker could signal erroneous words red, probably erroneous words with orange, while the grammar checker reports its suggestions in blue,  or different colors for different levels of severity (error, warning, info). Even the synonym function could signal synonym availability and offer suggestions.

Hyphenation would just offer the hyphenation options for the words.

This scheme would allow for different plug-ins using different programming languages, all contributing differently, but presenting text improvement suggestions in a standardised way to the applications.



voeg deze pagina toe aan je favoriete socail network
Laatst aangepast op maandag 21 juni 2010 09:51  
Banner

Zoeken

Webopentaal.org