OmegaT and asian languages
Thread poster: Neirda
Neirda
Neirda  Identity Verified
China
Local time: 16:24
Chinese to French
+ ...
Mar 20, 2013

Hello everyone,

So I just got started with OmegaT and Chinese (simplified), and quickly discovered that Fuzzy Matches and Glossary don't seem to work as they should.

Nothing ever shows up on both windows, even after adding entries on the glossary.
I created a test file with sentences of different similarities and lenghts to test the software's behavior, and it seems like only 100% matches are correctly detected & replaced.

I tried to run the softwar
... See more
Hello everyone,

So I just got started with OmegaT and Chinese (simplified), and quickly discovered that Fuzzy Matches and Glossary don't seem to work as they should.

Nothing ever shows up on both windows, even after adding entries on the glossary.
I created a test file with sentences of different similarities and lenghts to test the software's behavior, and it seems like only 100% matches are correctly detected & replaced.

I tried to run the software using Applocale (the Windows launcher for non-unicode systems) with no change. I checked the project files, TMXes and the glossary file are correctly created and filled. So I don't know where would be the problem. Maybe a char display issue ?
Collapse


 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 10:24
English to French
+ ...
Use a tokenizer Mar 20, 2013

Pierret Adrien wrote:
So I just got started with OmegaT and Chinese (simplified), and quickly discovered that Fuzzy Matches and Glossary don't seem to work as they should.

Nothing ever shows up on both windows, even after adding entries on the glossary.
I created a test file with sentences of different similarities and lenghts to test the software's behavior, and it seems like only 100% matches are correctly detected & replaced.

I tried to run the software using Applocale (the Windows launcher for non-unicode systems) with no change. I checked the project files, TMXes and the glossary file are correctly created and filled. So I don't know where would be the problem. Maybe a char display issue ?

For glossaries, it could be an encoding issue. You could test with an English or French source document and glossary to check everything is working as it should.
(As long as you use correctly UTF-8 glossaries, not system-encoded ones, everything should be fine.)

For fuzzy matches, it's very unlikely.

By default, OmegaT uses Java tokenizer, which can only detect words when they are separated by a space. Of course, it doesn't work for CJK languages.

That's why we provide also tokenizers:
http://www.omegat.org/en/howtos/tokenizer.php

For Chinese, LuceneSmartChineseTokenizer seems to be the better one.
Do not forget to use also a target tokenizer, so that you don't have issues with spellchecking in European target languages.

Didier


 
Neirda
Neirda  Identity Verified
China
Local time: 16:24
Chinese to French
+ ...
TOPIC STARTER
You were right Mar 20, 2013

I must have missed something the first time, I re-set up the tokenizer launcher acording to instructions, and now it works, both fuzzy matches and glossary.

Thank you, and sorry for the trouble.


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


OmegaT and asian languages






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »