Extracting project-relevant units from a large TM
Thread poster: Filip Látal
Filip Látal
Filip Látal
Czech Republic
Local time: 01:59
English to Czech
+ ...
Apr 8, 2022

I have a document for translation and a very large TM. I know there are translation units in the TM relevant to my document but I suppose most of them are not relevant. What are my options if I want to extract only those units that are XX% matches in relation to my document and/or that contain some frequently occurring phrases from my document?

 
Natalie
Natalie  Identity Verified
Poland
Local time: 01:59
Member (2002)
English to Russian
+ ...

Moderator of this forum
SITE LOCALIZER
Which CAT tool? Apr 8, 2022

Hi Filip, you forgot to mention the CAT tool that you are using.

Or are you just looking for a tool that would allow you to perform this task?


 
Arianne Farah
Arianne Farah  Identity Verified
Canada
Local time: 19:59
Member (2008)
English to French
If in Studio Apr 8, 2022

Right click on the project and select Batch Tasks > Populate Project Translation Memories.

Once that is done, right click on the project again, select Project Settings and you'll see the Project TM under the main TM. Uncheck the main TM and you will only use the project TM for the project - it's quite practical when you have a huge TM and just want to pull the TUs that are relevant to your project.


Dan Lucas
 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
CafeTran Espresso's Total Recall Apr 9, 2022

Filip Látal wrote:

I have a document for translation and a very large TM. I know there are translation units in the TM relevant to my document but I suppose most of them are not relevant. What are my options if I want to extract only those units that are XX% matches in relation to my document and/or that contain some frequently occurring phrases from my document?


When you finish your translation project, the created translation memory is saved to a TMX file such as ProjectTM.tmx located in the project folder. You can reuse the TMX file in another project but more convenient solution is to store the segments of your project in a special memory system called Total Recall. It lets you build up your growing segments base over time and recall only the relevant segments for each new project. Think of it as the long-term memory which is able to bring the segments back to the short-term (working) memory used in the current translation project. Total Recall does not retrieve all the segments but only those which contain the words of your currently translated source document, making the working memory compact and fast. The recalled segments are used in automatic matching actions such as fuzzy and subsegment matching. They also take part in the auto-assembling process, concordance search and auto-suggestion. Finally, the segments can be saved to a TMX file and send to another translator who works on the project.


You can use an SQL database for really large TMs:

https://cafetran.freshdesk.com/support/solutions/folders/6000058183


 
Matthias Brombach
Matthias Brombach  Identity Verified
Germany
Local time: 01:59
Member (2007)
Dutch to German
+ ...
My 5 cents Apr 9, 2022

Filip Látal wrote:

I have a document for translation and a very large TM. I know there are translation units in the TM relevant to my document but I suppose most of them are not relevant. What are my options if I want to extract only those units that are XX% matches in relation to my document and/or that contain some frequently occurring phrases from my document?


You may try to export the filtered relevant segments from your TM using the filter function in the TM pane of Studio (filter by options like client, project, date, translator etc. which you have to check first under properties) or perform a pretranslation with very low match settings to fill all your segments in your xlf file with TM hits that may be of any use even for the 50-74% matches. But there you may also get entries from your TM that are not related to your client if we talk about a "Big Mama" TM. As far as I know, both Studio and Deja Vu do not offer a function to export the segments that lead to the result of your analysis. Other CATs may offer these or try the approach suggested by Hans. Good luck and please let us know what solution you opted for.


 
Philippe Locquet
Philippe Locquet  Identity Verified
Portugal
Local time: 00:59
English to French
+ ...
If it matters, TM strategies Apr 9, 2022

If you have the entire TM connected to the current document you are translating, it won't matter much, your CAT will check matches against the TM. The only difference will be with performance, depending on the CAT, TM system and computer hardware you may experience a slight lag when checking fuzzy for a segment or doing a concordance search.

Now, if you want to get specific suggestions and not see matches with a different language style/vocabulary there is not really a tool to help
... See more
If you have the entire TM connected to the current document you are translating, it won't matter much, your CAT will check matches against the TM. The only difference will be with performance, depending on the CAT, TM system and computer hardware you may experience a slight lag when checking fuzzy for a segment or doing a concordance search.

Now, if you want to get specific suggestions and not see matches with a different language style/vocabulary there is not really a tool to help you do this after the fact (only concordance will help).
If you remember the dates when other relevant projects were translated, you could filter and extract in your TM based on these dates.

To avoid such problems, you can change your TM strategy from Big Momma to project specific separated TMs.
Another option is to use specific custom attributes. These can be used at a later point to perform specific extractions.

Hope this helps
Collapse


Jorge Payan
 
Filip Látal
Filip Látal
Czech Republic
Local time: 01:59
English to Czech
+ ...
TOPIC STARTER
Thank you for your answers Apr 11, 2022

Thank you to everybody for your insights!
I deliberately did not disclose my purpose so as not to limit the scope of possible answers. I wanted a TM extract to be able to adapt my MT engine to my particular project and obtain better quality MT content.
Currently I'm playing around with settings for populating project translation memories in Trados Studio, which is the most easily accessible option for me.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Extracting project-relevant units from a large TM







Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »