what is the best way to download web sites for translation in Word?
Thread poster: Patricia Rosas
Patricia Rosas
Patricia Rosas  Identity Verified
United States
Local time: 07:56
Spanish to English
+ ...
In memoriam
Aug 17, 2009

Hi, everyone!

A client has just asked me to download the company's web site, put each page in the left-hand column of a table in Word, and provide a translation in the right-hand column.

When I did this, I noticed that some of the text appears in a position that is radically different from the position where it actually appears on the page. This may not matter, but it led me to wonder if there is a better way to do this.

Thanks,
Patricia


 
Piotr Bienkowski
Piotr Bienkowski  Identity Verified
Poland
Local time: 16:56
English to Polish
+ ...
I'd do it this way Aug 18, 2009

Patricia Rosas wrote:

Hi, everyone!

A client has just asked me to download the company's web site, put each page in the left-hand column of a table in Word, and provide a translation in the right-hand column.

When I did this, I noticed that some of the text appears in a position that is radically different from the position where it actually appears on the page. This may not matter, but it led me to wonder if there is a better way to do this.

Thanks,
Patricia


Use WinHTTrack to download the entire web site in source format.

Use a CAT tool, like Swordfish or TagEditor to translate the content.

If the client insists on a table, you can get a table view from the translation preview in Swordfish, and use that in Word.

You can try Swordfish for a month for free. In a CAT tool you can also merge all pages into one file, which is also an advantage, if there are many short pages.

HTH

Piotr


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 16:56
Member (2006)
English to Afrikaans
+ ...
That is precisely the reason, then Aug 18, 2009

Patricia Rosas wrote:
When I did this, I noticed that some of the text appears in a position that is radically different from the position where it actually appears on the page.


Yes, and that is why the client wants a two-column file in MS Word... so that he can see which translated paragraph is the translation of which original paragraph.


 
esperantisto
esperantisto  Identity Verified
Local time: 17:56
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
This has nothing to do with donwload Aug 18, 2009

Patricia Rosas wrote:

…text appears in a position that is radically different from the position where it actually appears on the page.


This is about the page layout, the HTML code, so you can’t change anything when downloading, because what you download is some static piece of code which may result from some dynamic content supplied on your demand.

Ask your client to supply the contents in plain-text format.


 
Patricia Rosas
Patricia Rosas  Identity Verified
United States
Local time: 07:56
Spanish to English
+ ...
TOPIC STARTER
In memoriam
thanks! Aug 19, 2009

Thanks Samuel, Piotr, and Esperantisto for your replies.

The client doesn't want to get involved in the details of how I get the material, so they won't provide the plain-text format. But it seems that things are working out just fine. I copy each page to a .txt file, and then cut and paste it into the table in word. Sometimes, because of the order issues, it is a bit confusing when I look at the web page and try to find the text, which is often hidden in a flyout. But it seems to
... See more
Thanks Samuel, Piotr, and Esperantisto for your replies.

The client doesn't want to get involved in the details of how I get the material, so they won't provide the plain-text format. But it seems that things are working out just fine. I copy each page to a .txt file, and then cut and paste it into the table in word. Sometimes, because of the order issues, it is a bit confusing when I look at the web page and try to find the text, which is often hidden in a flyout. But it seems to be working out and I'm getting any help/answers I need from the webmistress.
Collapse


 
Boris Sigalov
Boris Sigalov
Local time: 17:56
English to Russian
Outlandish... Aug 19, 2009

Patricia Rosas wrote:

The client doesn't want to get involved in the details of how I get the material


It's client's responsibility to deliver the documents to be translated to a service provider...


Just imagine such a dialogue:

- Hello, I've got a TV set that needs to be repaired.

- Fine, we would be happy to help you. Where could we find it?

- What?!? It's none of my business: I don't want to get involved in the details of how you get my TV set... you are professionals and should know everything yourself...


Isn't it absurd?


Sven-Andris Portz
 
Patricia Rosas
Patricia Rosas  Identity Verified
United States
Local time: 07:56
Spanish to English
+ ...
TOPIC STARTER
In memoriam
I really don't understand this ... Aug 19, 2009

I apologize if I'm missing your point, Boris, but why should my client have to bother if I can access everything via Internet? This is a direct client (a book publisher) and not an agency giving me work to do.

Perhaps their webmaster could send the files easily enough, but it would take away from time she needs to do other work, and since they are paying me, and I'm able to download everything without a problem, I don't mind.

My only concern is if I'm going to miss text
... See more
I apologize if I'm missing your point, Boris, but why should my client have to bother if I can access everything via Internet? This is a direct client (a book publisher) and not an agency giving me work to do.

Perhaps their webmaster could send the files easily enough, but it would take away from time she needs to do other work, and since they are paying me, and I'm able to download everything without a problem, I don't mind.

My only concern is if I'm going to miss text that needs to be translated by doing it in the two table format.
Collapse


 
heikeb
heikeb  Identity Verified
Ireland
Local time: 15:56
Member (2003)
English to German
+ ...
. Aug 19, 2009

Patricia Rosas wrote:

I apologize if I'm missing your point, Boris, but why should my client have to bother if I can access everything via Internet? This is a direct client (a book publisher) and not an agency giving me work to do.

Perhaps their webmaster could send the files easily enough, but it would take away from time she needs to do other work, and since they are paying me, and I'm able to download everything without a problem, I don't mind.

My only concern is if I'm going to miss text that needs to be translated by doing it in the two table format.


Well, it is a highly unprofessional approach. Not only does it cause you extra time and work to get everything into a table, but just imagine the poor webmaster at the end, having to replace all individual strings with your text manually, one by one.

If they'd send you the entire website (all files included, correct folder structure, etc.) and you use a CAT tool for the translation, all their webmaster has to do in the end is to upload the translated files. So I wouldn't worry about asking the webmaster for a few moments of their time; they'd be happy to be spared the extra work!

It is simply not efficient to translate a website in any other format than the original one. If you're planning on just copying what you see on each web page into a table you also might miss text that such as drop-down menus.

As you say, your client is a book publisher, not a website translation specialist. But you are the specialist and should be able to advise your client on the best, most time-effective and efficient approach for this project.


Sven-Andris Portz
 
Patricia Rosas
Patricia Rosas  Identity Verified
United States
Local time: 07:56
Spanish to English
+ ...
TOPIC STARTER
In memoriam
is a cat tool necessary? Aug 19, 2009

Heike:

Thanks for answering but I still have doubts about this. The web master was the one who asked me to put it in table format after I suggested doing it some other ways. Why is a CAT tool necessary?

Would you (or others) please help me understand something about CAT tools, too?

I have a WordFast license but never felt I gained any ground using it. For example, I almost always have to reorganize the structure of a sentence to get a satisfactory translati
... See more
Heike:

Thanks for answering but I still have doubts about this. The web master was the one who asked me to put it in table format after I suggested doing it some other ways. Why is a CAT tool necessary?

Would you (or others) please help me understand something about CAT tools, too?

I have a WordFast license but never felt I gained any ground using it. For example, I almost always have to reorganize the structure of a sentence to get a satisfactory translation. So, how does WordFast deal with things if the end of the sentence (in the source text) appears at the beginning of the sentence (in the target text)?

Similarly, suppose I translate an adjective a certain way in one context but need to use a synonym in another (perhaps for sake of rhythm or other considerations). Wouldn't the software normally use the same word over and over?

Thanks very much!
Patricia
Collapse


 
heikeb
heikeb  Identity Verified
Ireland
Local time: 15:56
Member (2003)
English to German
+ ...
CAT tools Aug 19, 2009

Patricia Rosas wrote:

Heike:

Thanks for answering but I still have doubts about this. The web master was the one who asked me to put it in table format after I suggested doing it some other ways. Why is a CAT tool necessary?

Would you (or others) please help me understand something about CAT tools, too?

I have a WordFast license but never felt I gained any ground using it. For example, I almost always have to reorganize the structure of a sentence to get a satisfactory translation. So, how does WordFast deal with things if the end of the sentence (in the source text) appears at the beginning of the sentence (in the target text)?

Similarly, suppose I translate an adjective a certain way in one context but need to use a synonym in another (perhaps for sake of rhythm or other considerations). Wouldn't the software normally use the same word over and over?

Thanks very much!
Patricia


CAT tools have several extremely useful functions, not just translation memory.
As for tm: It doesn't work on a word basis, but on a segment basis. I.e. an individual adjective would not give you a 100% match when used in a different sentence. You'd get a 100% match only if the entire segment is identical. And even if you get a 100% match, you can of course change any bit to your liking.

If you need to reorganize the structure of a sentence, you probably have only a partial match. If you're unhappy with getting the translation of at least part of the sentence presented by the CAT tool, you can set the threshold higher so that only the better matches - with less restructuring - are displayed. I usually have it at a rather low setting, though, since sometimes even low matches of 30% require only very few changes in the target. Don't be discouraged too early from using a CAT tool; it might take a little while to get used to it, but it's definitely worth it!

It depends of course on the kind of text you usually translate how much you actually profit from the translation memory. But even if I don't get a single match from the tm, I always work within a CAT tool because of all the other benefits they offer.

Apart from translation memory and other useful features, CAT tools help you tremendously when working with tagged texts such as html files. If you open the source file of a web page, you see all the code there. You might get a page packed with text, but only a few words actually need to be translated. The translatable text is usually embedded in quotes, but there are also some scripts (e.g. Java script) that contains quoted text that usually doesn't need to be translated. That's why inserting your table translation into the source file would be a lot of work for the webmaster. Maybe they are simply not aware of the existence/capabilities of CAT tools.

CAT tools present you with only the text to translate; tags that directly affect the translatable text are (or should be) visible (the way these tags are displayed varies with CAT tools). So instead of a mess of script, html code and translatable text you can easily identify the translatable portions. External tags and code that you don't have to worry about are not displayed at all. Only tags that you need for the translation (formatting, anchors for links) are shown. Tags are protected, so you don't accidentally change them. CAT tools usually also alert you if you forgot to insert a tag in the target segment. They also display all text, so for drop-down menus, for instance, you get all options to translate as they are all visible in the code. In order to translate html pages, ideally you should have some rudimentary understanding of html code and scripts to recognize the formatting tags or link anchors you probably need to reorder to some extent in the target segment.

(That's another thing: Usually, web pages contain a good number of links. If you work in a table, you'd have to make sure to mark the translated words that are supposed to link to some other page or segment and make sure the webmaster can identify which source link corresponds to which target link. No need to worry about that using a CAT tool.)

If you're not familiar with how Wordfast handles html files (can't help you there, I've never used that program), you can just download a webpage and open it in Wordfast. You should see that it is quite easy to translate in that format. And after you're done with the translation, the file is ready to upload to the localized website.
So you don't have to find and copy each snippet of text into your table, and the webmaster doesn't have to do the same in reverse.

Even if you need to familiarize yourself with using the CAT tool for html pages, the entire process should be that much faster that you have ample time to do so!


A CAT tool is not necessary, but in this case, I would request the webmaster to extract all translatable strings for you. This way, they can
- make sure that indeed all translatable text is covered
- use their own system to note source location in order to facilitate the re-insertion of the translation in the correct files.
- simply have better control over the process.

If they still insist you do the copy/pasting of source text into your table, make sure to add a disclaimer that you can't guarantee to catch all translatable text and make sure to be paid for the extra time it will take you to prepare the table. Maybe giving them an estimate of the additional time needed might convince them that using a CAT tool would be a good idea for all involved!


Sven-Andris Portz
 
Patricia Rosas
Patricia Rosas  Identity Verified
United States
Local time: 07:56
Spanish to English
+ ...
TOPIC STARTER
In memoriam
thank you, Heike! Aug 20, 2009

Heike,

I'm extremely grateful that you took your precious time to respond in such depth. I'll write again (briefly, I hope) tomorrow once I've absorbed everything.

Again, thanks for your generosity!

Patricia


 
Johnny Speiermann
Johnny Speiermann
Denmark
Local time: 16:56
English to Danish
+ ...
Never download a web site for localization Sep 22, 2009

You should never translate files that was downloaded directly from a web site.

Nowadays most web sites use dynamic content where information is collected from both static strings in HTML files (or ASP, PHP or other file format) and from databases.

When this is the case there will also be a lot of repetitions as most pages will have a similar layout, and if you are translating for example product texts some strings will appear a lot of times even if they probably only ap
... See more
You should never translate files that was downloaded directly from a web site.

Nowadays most web sites use dynamic content where information is collected from both static strings in HTML files (or ASP, PHP or other file format) and from databases.

When this is the case there will also be a lot of repetitions as most pages will have a similar layout, and if you are translating for example product texts some strings will appear a lot of times even if they probably only appear once in the original files.

I run a few web shops myself, and a good example is the page with technical specifiations for items like surveillance cameras. For all cameras there is a list of technical specifiations in table format listing 40 different specifications (approximately 100 words). These specifications are the same for all cameras. If I download the pages using HTTtrack or any other tool this will generate 100 words of repetitions for each of these items instead of just a total of 100 words. Even if you use a CAT tool for the translation there will be a lot of extra work involved in handling this huge amount of unnecessary words.

In addition menus and navigation elements will be downloaded for each 'page' on the web site, and this will add even more repetitions to the picture.

In addition some web sites make use of the keywords used in Google, so the content on the web site will depend on the search the user made in Google.

And there will most likely be a lot of texts that will not be directly accessible by downloading the files using these tools - again because the content might depend on what the user does and selects.

So it's actually in everyones best interest to ask for the source files no matter what file formats they might be in. That's the only way to ensure proper localization of a web site.
Collapse


 
Patricia Rosas
Patricia Rosas  Identity Verified
United States
Local time: 07:56
Spanish to English
+ ...
TOPIC STARTER
In memoriam
hmmm.... Sep 22, 2009

Well, Johnny, I appreciate it that you took the time to write such a detailed message. However, the web mistress told me to do it this way, and the site was translated weeks ago. I haven't clicked on every link, but it appears to be working fine.

http://www.editorialazabache.com/Inicio/tabid/100/List/1/Language/en-US/Default.aspx

... See more
Well, Johnny, I appreciate it that you took the time to write such a detailed message. However, the web mistress told me to do it this way, and the site was translated weeks ago. I haven't clicked on every link, but it appears to be working fine.

http://www.editorialazabache.com/Inicio/tabid/100/List/1/Language/en-US/Default.aspx

Anyway, I'm sure the information you shared will come in handy for people in the future.

Thanks,
Patricia
Collapse


 
Johnny Speiermann
Johnny Speiermann
Denmark
Local time: 16:56
English to Danish
+ ...
web mistress :-) Sep 23, 2009

Good to know that it worked out. But in most cases it will cause problems. I guess the customer must also have spend quite some time copy/pasting which they could have saved.

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

what is the best way to download web sites for translation in Word?






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »