Tool to convert HTML-Codes to text?

The problem: I have some large text files that contain a mixture of umlauts (ä,ö,ü etc.) and their representation in HTML-code (ä ö etc.).

What I would like to have all HTML-code converted to actual umlauts.

The solution: Would be a tool that makes this conversion.

Does anybody know of such a tool?

There’s always find/replace, of course, but I believe this is the sort of thing TextSoap is good at; I doubt that particular usage is in the defaults, but TS allows you to very easily create custom “cleaners”, which you can then apply to any selected block of text.

I think I found it: It’s called “HTML Entity to Text” (in the HTML group) in TextSoap. But who wants TextSoap when you can have ‘sed’?!? :smiley:

sed -n -f html2txt.sed website1.html > website2.html

“html2txt.sed” is the following script:

s/ä/ä/g; s/Ä/Ä/g; s/ü/ü/g; s/Ü/Ü/g; s/ö/ö/g; s/Ö/Ö/g; s/ß/ß/g; p;

Thanks for all hints. It happened that I found a way to avoid the creation of that HTML-code in the export, so the problem no longer is one, but I noted all suggestions carefully just in case of… :exclamation: