Quantcast
Channel: Convert UTF-16 LE to UTF-8 in windows via command line - Super User
Viewing all articles
Browse latest Browse all 5

Convert UTF-16 LE to UTF-8 in windows via command line

$
0
0

(question re-written to be more useful)

I have a batch script which will interact with command line programs, take their output, and then perform decisions based on that output.

One of the programs I need to interact with is a fairly old one, so I am stuck with it's quirks. When I pipe it's output to a text file, that text file is in the UTF-16 LE encoding.

Here's how I do that:

program -parameter > resultat.txt

Under Windows 7, this encoding seems to be troublesome for cmd/batch work, because you cannot read the contents of such a text file into a variable.

Here is an example, (this only uses the first line of the text file):

set /p Var=<resultat.txtecho %Var%cmd /k

It just echoes nothing, saying "ECHO is on".

Also, if you use "type" to print the contents of the text file, there is weird spacing, suggesting it's not properly being processed.

Attempted solution [1] - Powershell

After research, I found that powershell can convert txt encodings, using the following method:

Get-Content -Path "path\file.txt" | Out-File -FilePath "path\new_file.txt" -Encoding <encoding>

Using Notepad++, I did some research, what encoding do I need to attain?

UTF-8 (no BOM), which is equivalent to "ANSI" in Notepad, is the encoding I need, loading text files to variables, and the "type" command, both work flawlessly when this encoding is used. How do I know? If I open the piped text file in Notepad, and resave as "ANSI" encoding, everything works flawlessly.

-Encoding ascii

...Is the option which should have worked, as this produces a result in UTF-8 (no BOM), but it seems to be unable to handle UTF-16 LE source encoding format, and does not produce useable output. When I opened the resultant file in Notepad++ it identified it as UTF-16 LE "Unix", which was odd.

Funny enough: if I resave piped txt file as "unicode" in Notepad, this produces a UTF-16 LE BOM file, which works with the above conversion parameter to produce a perfect UTF-8 file. At this point, I extended my research to also ask the question "How can I add BOM to UTF-16 LE encoding?" As I could combine such knowledge with the powershell knowledge. However, spoiler alert: I was unsuccessful in finding a decent answer.

-Encoding utf8

...Is another similar option, but it produces a UTF-8 BOM file (the equivalent of saving as "UTF-8" in Notepad), this produces an output with corruption.

So to sum up:

I am looking for a command line tool/method (open or proprietary, 1st or 3rd party), to be able to achieve a convesion as follows:

  1. UTF-16 LE - Windows(CR LF) straight to UTF-8 - Windows(CR LF)

  2. UTF-16 LE - Windows(CR LF) to UTF-16 LE BOM - Windows(CR LF)


Viewing all articles
Browse latest Browse all 5

Latest Images

Trending Articles





Latest Images