Jump to content











Photo
- - - - -

Batch replace text in Unicode .reg Files?


Best Answer Wonko the Sane , 29 May 2014 - 08:47 AM

I need to replace C: with X: in a couple of .reg files. Tried gsar but couldn't get it to work on the (unicode) .reg files.

Well, if you have it handy, nothing prevents you from using it, as it can use hex codes too.

To replace ALL occurrences of C:\ with X:\ (in a UNicode text file) that would be:





gsar -o -i -sC:x00:::x00\ -rX:x00:::x00\ <file>

:duff:

Wonko

Go to the full post


  • Please log in to reply
14 replies to this topic

#1 misty

misty

    Silver Member

  • Developer
  • 879 posts
  •  
    United Kingdom

Posted 28 May 2014 - 09:52 PM

Any suggestions for a commandline text replacement tool capable of searching for and replacing strings in unicode files?

I need to replace C: with X: in a couple of .reg files. Tried gsar but couldn't get it to work on the (unicode) .reg files.

Tried fnr.exe. Great tool and works well. As I was running it in Windows 8.1 (Enterprise Eval) I hadn't realised that it has .NET version 4 dependencies. A lightweight alternative would be good.

Or maybe gsar can do it and I'm just not using the correct commands.

:cheers:

#2 Wonko the Sane

Wonko the Sane

    The Finder

  • Advanced user
  • 14178 posts
  • Location:The Outside of the Asylum (gate is closed)
  •  
    Italy

Posted 29 May 2014 - 08:47 AM   Best Answer

I need to replace C: with X: in a couple of .reg files. Tried gsar but couldn't get it to work on the (unicode) .reg files.

Well, if you have it handy, nothing prevents you from using it, as it can use hex codes too.

To replace ALL occurrences of C:\ with X:\ (in a UNicode text file) that would be:





gsar -o -i -sC:x00:::x00\ -rX:x00:::x00\ <file>

:duff:

Wonko



#3 misty

misty

    Silver Member

  • Developer
  • 879 posts
  •  
    United Kingdom

Posted 29 May 2014 - 01:38 PM

@Wonko
I wasn't remotely surprised that your example worked. :thumbsup:

I was aware that I could use hex codes - hadn't realised that (some?) unicode text is padded with null values - at least that's what I'm assuming the :x00 in your example are.

Anyway, :cheers:

Misty

#4 Wonko the Sane

Wonko the Sane

    The Finder

  • Advanced user
  • 14178 posts
  • Location:The Outside of the Asylum (gate is closed)
  •  
    Italy

Posted 29 May 2014 - 02:37 PM

I was aware that I could use hex codes - hadn't realised that (some?) unicode text is padded with null values - at least that's what I'm assuming the :x00 in your example are.

Not really.
I mean they "appear" as "padded", but in reality they are not. :w00t:

ASCII uses a 1 byte code for each character (i.e. you have 16^2=256 characters codes available, of which the first 128 are "standard" ASCII or 7-bit ASCII, and the rest are "Extended ASCII" and will render differently depending on codepage).
http://www.asciitable.com/
UNICODE uses a 2 bytes code for each character (i.e. you have 16^4=65536 characters codes available).
http://www.unicodetables.com/
BUT the first 256 character codes are the same.
http://www.unicode.o...s/PDF/U0000.pdf
http://www.unicode.o...s/PDF/U0080.pdf

The (CAPITAL) letter A is in ASCII Hex 41 and in Unicode Hex 0041.
The former will be (in a file) saved as:



41

whilst the latter will be (since the i386 platform is BIG ENDIAN little endian):



4100

The advantage of using (as I did in the provided example) a "mixed" mode is that you can easily have CaSe InSeNsItIvEnEsS, i.e. the provided line will "catch" both c and C, otherwise in "pure hex", you would need two commands:



gsar -o  -s:x43:x00:::x00:x5C -r:x58:x00:::x00:x5C <file>
gsar -o  -s:x63:x00:::x00:x5C -r:x78:x00:::x00:x5C <file>

:duff:
Wonko


  • RoyM likes this

#5 homes32

homes32

    Gold Member

  • .script developer
  • 1028 posts
  • Location:Minnesota
  •  
    United States

Posted 29 May 2014 - 10:08 PM

whilst the latter will be (since the i386 platform is BIG ENDIAN):

4100

 
actually the i386 platform is LITTLE ENDIAN which is what I think you meant to say anyway as your example (41000000) is in little endian. big endian would be 00000041 if anyone cares...

:ph34r:


  • RoyM likes this

#6 Wonko the Sane

Wonko the Sane

    The Finder

  • Advanced user
  • 14178 posts
  • Location:The Outside of the Asylum (gate is closed)
  •  
    Italy

Posted 30 May 2014 - 08:22 AM

My bad :blush: , I got it reversed (corrected).

 

:duff:

Wonko



#7 pscEx

pscEx

    Platinum Member

  • Team Reboot
  • 12701 posts
  • Location:Korschenbroich, Germany
  • Interests:What somebody else cannot do.
  •  
    European Union

Posted 01 June 2014 - 01:09 PM

I have been some days away, so late answer.

 

For my 2013 projects / plugins delivery I use rxrepl

 

After some try and error I'm very satisfied with the functionality.

 

Peter



#8 Wonko the Sane

Wonko the Sane

    The Finder

  • Advanced user
  • 14178 posts
  • Location:The Outside of the Asylum (gate is closed)
  •  
    Italy

Posted 01 June 2014 - 02:09 PM

Good :).
JFYI, I also use a cannon to shoot at flies from time to time :w00t: :ph34r: ;), but that doesn't really means it is *needed*.

I mean, FAQ's/FGA's:
Q. Is gsar the ideal "generic" tool for searching and replacing text in a UNICODE text file?
A. No.
Q. I already have gsar handy, can I use it to do a simple replacement of C:\ to X:\ in a UNICODE text file?
A.Yes.
And:
Q. Is gsar an ideal "generic" search and replace tool, capable of dealing with BOTH text files and binary ones?
A. Yes. Try reading it's name as an acronym of General Search And Replace and read how it's Author defines it:

General Search And Replace on files. The search and replace strings may consist of
any character in the range 0 - 255. This allows gsar to work with binary as well as text files.
A binary grep but with no regexp support.


Q. Can I use rxrepl for anything that is not a text file?
A. No. Yes, by cleverly using the support for escape characters. (something that is not explicited in the tool's page, nor readme.txt)

Occam's razor:
http://en.wikipedia....i/Occam's_razor

:duff:
Wonko



#9 misty

misty

    Silver Member

  • Developer
  • 879 posts
  •  
    United Kingdom

Posted 01 June 2014 - 02:44 PM

Hi Peter. Thanks for the link. The following program is also from the same author - Simple Search and Replace (SSR).

I tried SSR prior to Wonko posting a working search and replace string for gsar - it worked well. I do however really like gsar due to it's very small size and portability. And it was handy - and redistributable.

Regards,

Misty

#10 misty

misty

    Silver Member

  • Developer
  • 879 posts
  •  
    United Kingdom

Posted 01 June 2014 - 02:45 PM

...JFYI, I also use a cannon to shoot at flies from time to time :w00t: :ph34r: ;), ...

That must really upset the cats :P

#11 pscEx

pscEx

    Platinum Member

  • Team Reboot
  • 12701 posts
  • Location:Korschenbroich, Germany
  • Interests:What somebody else cannot do.
  •  
    European Union

Posted 01 June 2014 - 02:59 PM

Hi Peter. Thanks for the link. The following program is also from the same author - Simple Search and Replace (SSR).

I tried SSR prior to Wonko posting a working search and replace string for gsar - it worked well. I do however really like gsar due to it's very small size and portability. And it was handy - and redistributable.

Regards,

Misty

My reason to use rxrepl was, that is was the only app I found to do:

 

Replace in the line:

<entry key="!Version">????.??.??</entry>

(? can be any actually unknown number) by the actual version, e.g. 2014.06.01

 

The command line is:

rxrepl -I rxrepltest.xml !Version\x22\x3e....\...\... -r !Version\x22\x3e2014.06.01

BTW: Wonko's last FAQ is answered wrong. AFAIK rxrepl can replace every multi byte hexcode by a different multibyte hexcode, even with different lengths. Of coure there is no pointer update or similar in cpmiled and linked files (see the hex-escapes in the above command line).

 

Peter



#12 Wonko the Sane

Wonko the Sane

    The Finder

  • Advanced user
  • 14178 posts
  • Location:The Outside of the Asylum (gate is closed)
  •  
    Italy

Posted 01 June 2014 - 03:16 PM


BTW: Wonko's last FAQ is answered wrong. AFAIK rxrepl can replace every multi byte hexcode by a different multibyte hexcode, even with different lengths. Of coure there is no pointer update or similar in cpmiled and linked files (see the hex-escapes in the above command line).

Peter

Good to know. :)
Corrected.

That must really upset the cats :P

Naaah, they actually like it as the cannon gets warm (though still they prefer cars bonnets)
http://img.cliparto....t-on-cannon.jpg

:duff:
Wonko

#13 pscEx

pscEx

    Platinum Member

  • Team Reboot
  • 12701 posts
  • Location:Korschenbroich, Germany
  • Interests:What somebody else cannot do.
  •  
    European Union

Posted 01 June 2014 - 03:34 PM

A. No. Yes, by cleverly using the support for escape characters. (something that is not explicited in the tool's page, nor readme.txt)

But it is displayed when using the help function ...

  The replacement argument text may include:
      - Standard escape characters ( \\ \a \b \e \f \n \r \t \u???? \x?? )
      - Pattern group matches ( \0 \1 \2 \{10} )

Peter



#14 Wonko the Sane

Wonko the Sane

    The Finder

  • Advanced user
  • 14178 posts
  • Location:The Outside of the Asylum (gate is closed)
  •  
    Italy

Posted 01 June 2014 - 03:47 PM

Look, you can stamp your feet as long and as hard as you want, but provide this snippet to a sample of people (already quite familiar with computing):

The replacement argument text may include:
- Standard escape characters ( \\ \a \b \e \f \n \r \t \u???? \x?? )
- Pattern group matches ( \0 \1 \2 \{10} )

 

and see how many people can read in it:

Notwithstanding anything printed above and in the readme.txt, this tool is not limited to text files, but can also be used as a pure hex file replacer by using the \x?? escape characters syntax. 

 

 

As it often happens with this kind of tools, particularly the most powerful ones, the documentation is scarce, non-existent or too cryptic for any practical use, and obviously (like it happens as an example to say 99% of any Linux originated tool) there are a zillion options that are not supported by corresponding examples.

 

Which does not of course mean that a given tool is not very good, only that it sucks in documentation and usability.

 

Take, as an extreme example, mkisofs, which I believe is one of the most used (and exceptionally good :)) tool for creating a .iso, and ask anyone (with a programming/scripting background but not specifically familiar with the tool) to create a working (bootable) .iso only with the help of it's man page :whistling::

http://cdrecord.berl...kisofs-2.0.html

 

 

:duff:

Wonko



#15 pscEx

pscEx

    Platinum Member

  • Team Reboot
  • 12701 posts
  • Location:Korschenbroich, Germany
  • Interests:What somebody else cannot do.
  •  
    European Union

Posted 01 June 2014 - 04:11 PM

As it often happens with this kind of tools, particularly the most powerful ones, the documentation is scarce, non-existent or too cryptic for any practical use, and obviously (like it happens as an example to say 99% of any Linux originated tool) there are a zillion options that are not supported by corresponding examples.

 

I fully agree!

See my first post here:

After some try and error ...

Peter :cheers:






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users