Who knows a regular expression for this "garbage"?
Thread poster: Hans Lenting
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Sep 30, 2022

Something must have gone wrong during the creation of a project that a client has sent me.

The project contains segments like this one (see below). What would be a valid regular expression to filter segments like this?

(I assume that this is the content of an SVG graphic in
... See more
Something must have gone wrong during the creation of a project that a client has sent me.

The project contains segments like this one (see below). What would be a valid regular expression to filter segments like this?

(I assume that this is the content of an SVG graphic in a Schema ST4 file...)

https://www.dropbox.com/s/np6mlcbhxuloie0/schema_st4_garbage.txt.zip?dl=1


[Edited at 2022-09-30 09:30 GMT]
Collapse


 
Elena Feriani
Elena Feriani
Italy
Local time: 06:41
Member
French to Italian
+ ...
Filter by character length? Sep 30, 2022

Hi Hans,
That's a pretty long segment. If you are using Trados, you can use the Advanced Display Filter to filter by character length.

EDIT: I meant segment length

[Edited at 2022-09-30 10:49 GMT]


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Why didn't I think of that? (Because I was overwhelmed?) Sep 30, 2022

Elena Feriani wrote:

Hi Hans,
That's a pretty long segment. If you are using Trados, you can use the Advanced Display Filter to filter by character length.


Hi Elena,

Yes, it is a Trados project, but I prefer to translate it with CafeTran Espresso on my Mac.

Your suggestion is good, I can filter on segment length in CafeTran too.

Thanks!

H


Elena Feriani
 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
That was easy (if you know how to do it :))! Sep 30, 2022

Like this:

Screen Shot 2022-09-30 at 11.52.14

Of course the regular expression isn't completely valid, since the segments contain non-letter characters too, but it is fine enough.



[Edited at 2022-09-30 09:55 GMT]


Dan Lucas
 
Andrzej Mierzejewski
Andrzej Mierzejewski  Identity Verified
Poland
Local time: 06:41
Polish to English
+ ...
What about the original text? Sep 30, 2022

Hans Lenting wrote:
(I assume that this is the content of an SVG graphic in a Schema ST4 file...)


Did you receive the original text in PDF format for reference? If yes, you would see what that is. If not, I suggest to require such file ASAP from your client.

HTH


Hans Lenting
 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Yes, I received a PDF Sep 30, 2022

Andrzej Mierzejewski wrote:

Hans Lenting wrote:
(I assume that this is the content of an SVG graphic in a Schema ST4 file...)


Did you receive the original text in PDF format for reference? If yes, you would see what that is. If not, I suggest to require such file ASAP from your client.

HTH


Yes, I received a PDF. And I'm pretty sure that these long segments concern SVG images since every next segment contains a long path, ending with .svg .

Thanks for your help!


 
Andrzej Mierzejewski
Andrzej Mierzejewski  Identity Verified
Poland
Local time: 06:41
Polish to English
+ ...
If so, then... Sep 30, 2022

Such long character chains are nothing else than images. I'd simply delete them during the translation work. No need to have a special procedure or macro. Thereafter, I'd copy-and-paste the illustrations from the source PDF into the target DOC (or whatever your final format is) file in the formatting stage. That should satisfy the client unless no special requirements had been given.

Amendment: to make the work easier for myself, I'd replace such segment with a short info, e.g.: P
... See more
Such long character chains are nothing else than images. I'd simply delete them during the translation work. No need to have a special procedure or macro. Thereafter, I'd copy-and-paste the illustrations from the source PDF into the target DOC (or whatever your final format is) file in the formatting stage. That should satisfy the client unless no special requirements had been given.

Amendment: to make the work easier for myself, I'd replace such segment with a short info, e.g.: Page so-and-so, Figure so-and-so. And that's in both columns: Source and Target.

HTH

[Редактировалось 2022-09-30 12:10 GMT]

[Редактировалось 2022-09-30 12:22 GMT]
Collapse


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 07:41
English to Russian
Regex Sep 30, 2022

Hans Lenting wrote:
Of course the regular expression isn't completely valid, since the segments contain non-letter characters too, but it is fine enough.
You can use \S{30,} instead. \S = anything except white spaces; also I believe 300 is too much, 30 should suffice.


Hans Lenting
Dan Lucas
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Who knows a regular expression for this "garbage"?







Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »