ma1cius wrote:
I just tried assessing 9 xlsx files in OmegaT and got a total word count of 11063. I ran the same files through SmartCAT and got a total word count of 18,336.
OmegaT does not count repetitions in XSLS files, simply because they are not in the file (Microsoft removes them). To get a word count including repetitions, save the XSLS file under another format (e.g., XML 2003 spreadsheet).
Copying out the text from the largest file into a Word file, not including segments that were just numbers, I got a count of 12,053 from Microsoft Word's built-in word count. Including the numbers, this came to 19,147.
SmartCAT counted the same document at 13,187 words and counted 2,372 segments that contained just numbers or symbols, which, I think, were not included in the word count. OmegaT counted this file at 8,029 words.
It's not usual. Generally, OmegaT is rather close to Word.
This variance seems enormous. I can understand if it's not counting number/symbol-only segments,
Indeed, OmegaT does not count numbers.
which, I think, counts for much of the discrepancy between SmartCAT and Word but, even allowing for that, OmegaT's count comes out at about two thirds of MS Word's count. There is a lot of repetition but this should, surely, just be shown in the statistics and not affect the total words.
As I wrote above, this is not usual for Word documents. Have you checked what is loaded or not in OmegaT for the Word filter? Options > File Filters > Microsoft XML.
Do I have something majorly wrong in my OmegaT settings or have I somehow misunderstood how OmegaT presents word counts?
Another setting that might affect word count is Options > Tag processing (whether you include custom tags or not in statistics).
Can anyone explain how I might have got such different word counts and what I can do to restore my faith in the statistics generated by these CAT tools? I am using OmegaT 3.6.0 update 8.
For XLSX files, the explanation is obvious. For Word, it's hard to say without details.
Didier