Jump to content











Photo
- - - - -

binwalk

binwalk magic

  • Please log in to reply
23 replies to this topic

#1 Icecube

Icecube

    Gold Member

  • Team Reboot
  • 1063 posts
  •  
    Belgium

Posted 18 November 2012 - 09:21 PM

Posted Image

File Name: binwalk
File Submitter: Icecube
File Submitted: 18 Nov 2012
File Updated: 10 Feb 2014
File Category: Miscellaneous

Description:

Binwalk is a tool for searching a given binary image for embedded files and executable code. Specifically, it is designed for identifying files and code embedded inside of firmware images. Binwalk uses the libmagic library, so it is compatible with magic signatures created for the Unix file utility.
Binwalk also includes a custom magic signature file which contains improved signatures for files that are commonly found in firmware images such as compressed/archived files, firmware headers, Linux kernels, bootloaders, filesystems, etc.

http://code.google.com/p/binwalk/

Usage:

binwalk.exe -m magic.binwalk file_to_investigate
http://code.google.c...walk/wiki/Usage


The archive contains binwalk compiled in a cygwin environment.


Update:

Newer versions (>= 1.0) are writen in python instead of C, so it should be quite easy to get in working on Windows (and doesn't need cygwin anymore).
You can get this version from http://binwalk.org

Click here to download this file
  • Brito and Sakimichi like this

#2 joakim

joakim

    Silver Member

  • Team Reboot
  • 912 posts
  • Location:Bergen
  •  
    Norway

Posted 19 November 2012 - 10:54 AM

This tool is of great help when inspecting files for its content, not just firmware binaries but binaries in general.

#3 Icecube

Icecube

    Gold Member

  • Team Reboot
  • 1063 posts
  •  
    Belgium

Posted 19 November 2012 - 01:10 PM

It is also possible to write and use you own magic file to find the things you want in a file:
http://reboot.pro/14..._25#entry163066

#4 Sha0

Sha0

    WinVBlock Dev

  • Developer
  • 1682 posts
  • Location:reboot.pro Forums
  • Interests:Booting
  •  
    Canada

Posted 19 November 2012 - 02:54 PM

I tried to compile a version with MinGW instead of Cygwin, but it was taking more than 2 hours to try to "port" it, so I finally gave up. If I figure it out, I'll post it here, too. It's a very useful tool; like an exhaustive file.

#5 Wonko the Sane

Wonko the Sane

    The Finder

  • Advanced user
  • 16066 posts
  • Location:The Outside of the Asylum (gate is closed)
  •  
    Italy

Posted 19 November 2012 - 03:00 PM

I'll throw this on the table (and duck really quick :ph34r: ;)):
Any way to make use of a "good" set of definitions like the ones that come with TRiD?
http://mark0.net/soft-trid-e.html

:cheers:
Wonko

#6 joakim

joakim

    Silver Member

  • Team Reboot
  • 912 posts
  • Location:Bergen
  •  
    Norway

Posted 19 November 2012 - 05:36 PM

TRiD itself is different in that it only predicts type based on a files first set of bytes, for instance first 4 bytes. It is thus absolutely worthless for identifying content hidden in binary files. However, you could of course grab its "database" and implement it to something else that can scan content for these signatures. Either way, binwalk seems more solid as it also scans for patterns other than the signature present in the first bytes (for instance when the first 10 bytes are chopped off).

#7 Wonko the Sane

Wonko the Sane

    The Finder

  • Advanced user
  • 16066 posts
  • Location:The Outside of the Asylum (gate is closed)
  •  
    Italy

Posted 19 November 2012 - 08:50 PM

TRiD itself is different in that it only predicts type based on a files first set of bytes, for instance first 4 bytes. It is thus absolutely worthless for identifying content hidden in binary files. However, you could of course grab its "database" and implement it to something else that can scan content for these signatures. Either way, binwalk seems more solid as it also scans for patterns other than the signature present in the first bytes (for instance when the first 10 bytes are chopped off).

Not really. :dubbio:
In the sense that yes :), Trid as an app scans only "files as a whole" and not "files to search other files within", but the recognition patterns can be much more complex than "a few first bytes".

The advantage of having a way to use the definitions is that there is already a vast library and making a new one is easy if you have a few "samples":
http://mark0.net/soft-tridscan-e.html

:cheers:
Wonko

#8 Icecube

Icecube

    Gold Member

  • Team Reboot
  • 1063 posts
  •  
    Belgium

Posted 20 November 2012 - 01:39 AM

The advantage of having a way to use the definitions is that there is already a vast library and making a new one is easy if you have a few "samples":
http://mark0.net/soft-tridscan-e.html

:cheers:
Wonko

I think for the most formats there are not enough files scanned which results in poor quality of those signatures.
There are signatures with more than 100 zeros or that only match 1 byte.

I'll throw this on the table (and duck really quick :ph34r: ;)):
Any way to make use of a "good" set of definitions like the ones that come with TRiD?
http://mark0.net/soft-trid-e.html

:cheers:
Wonko

I quickly tried to write a converter to convert the TrID XML files (http://mark0.net/soft-tridnet.html) to libmagic format.

You can get a compiled version of file from here (I didn't test this Windows version, so I hope it still supports the features used in the magic file):
http://gnuwin32.sour...ckages/file.htm

To scan a file (and not stop after finding a match: add -k):
file -k -m trid.magic file_to_scan

For now a few warnings will be displayed because some of the text output for the fileformat is too long.

trid.magic, 4796: Warning: description `ADPCM (?) compressed file recorded by some MP3 Players/Voice re' truncated

trid.magic, 5557: Warning: description `TwinVQF (Transform-domain Weighted INterleave Vector Quantizati' truncated

trid.magic, 7259: Warning: description `ImageMagick Machine independent File Format bitmap (with rem) (' truncated

trid.magic, 7642: Warning: description `IBM Softcopy Reader (Bookmanager) Bookshelf (and Book) index fi' truncated

...

The fileformats are now organised alphabetically in the magic file. This gives some problems because the best defined fileformats should be in the beginning of the magic files.

Example: Some ZIP file are first detected as Autodesk FLIC Image Files (magic consists only of zero bytes):
Autodesk FLIC Image File (extensions: flc, fli, cel) (CEL)012- ZIP compressed archive (ZIP)
TrID also makes use of just searching for some strings. I am not yet certain what such a string search means: do we need to find all strings or not.

Hopefully someone is willing to rearrange the file formats (or point out the "bad/non-specific" ones).in the magic file.

Attached Files



#9 Wonko the Sane

Wonko the Sane

    The Finder

  • Advanced user
  • 16066 posts
  • Location:The Outside of the Asylum (gate is closed)
  •  
    Italy

Posted 20 November 2012 - 11:37 AM

I think for the most formats there are not enough files scanned which results in poor quality of those signatures.
There are signatures with more than 100 zeros or that only match 1 byte.

Well, it's not "perfect", I would however count it as a "second opinion" tool.
I use it often when recovering data to post-process files recovered with PHOTOREC and usually it does a good work.

I quickly tried to write a converter to convert the TrID XML files (http://mark0.net/soft-tridnet.html) to libmagic format.

Good. :)

You can get a compiled version of file from here (I didn't test this Windows version, so I hope it still supports the features used in the magic file):
http://gnuwin32.sour...ckages/file.htm

I seem to remember I have used in the past another Win32 port, I'll see if I can find it. :unsure:

The fileformats are now organised alphabetically in the magic file. This gives some problems because the best defined fileformats should be in the beginning of the magic files.



TrID also makes use of just searching for some strings. I am not yet certain what such a string search means: do we need to find all strings or not.

I think, but I have no actual data to confirm this, that each "found match" increases the "percentage of confidence" in the result.

Hopefully someone is willing to rearrange the file formats (or point out the "bad/non-specific" ones).in the magic file.

I guess it should be not that difficult, though I wonder which metrics could be used for the "sorting" (or more loosely for the "priorities"). :dubbio:

What I thought could be an added value however is not as much the actual existing archive of definitions, but rather the quick way that Tridscan provides to create a specific pattern file from a number of samples, in other words an automated way of making a pattern file "on the spot".

I know nothing of the .magic format, I'll have a look at it, maybe it's possible to make a quick batch to make a conversion from the generated .xml to .magic, it shouldn't be that difficult.

OT, but not much, and just for the record, some time ago I found a nice simple XML -> variables converter (that later disappeared) in case of need it is here:
http://www.911cd.net...ndpost&p=159829

:cheers:
Wonko

#10 MedEvil

MedEvil

    Platinum Member

  • .script developer
  • 7771 posts

Posted 20 November 2012 - 12:56 PM

Sorry for disturbing the discussion, but can anyone give me an example for a real world application for this?

:cheers:

#11 Sha0

Sha0

    WinVBlock Dev

  • Developer
  • 1682 posts
  • Location:reboot.pro Forums
  • Interests:Booting
  •  
    Canada

Posted 20 November 2012 - 01:28 PM

Sorry for disturbing the discussion, but can anyone give me an example for a real world application for this?

Sure.

#12 MedEvil

MedEvil

    Platinum Member

  • .script developer
  • 7771 posts

Posted 20 November 2012 - 01:48 PM

That's exactly the point, i don't get.
It is a tool to extract files from an embedded image. How is that useful, when it can't create an embedded image with the changed data?

Editing an image is useful. Just reading it, not so much. ;)

:cheers:

#13 Sha0

Sha0

    WinVBlock Dev

  • Developer
  • 1682 posts
  • Location:reboot.pro Forums
  • Interests:Booting
  •  
    Canada

Posted 20 November 2012 - 04:42 PM

That's exactly the point, i don't get.
It is a tool to extract files from an embedded image. How is that useful, when it can't create an embedded image with the changed data?

Editing an image is useful. Just reading it, not so much. ;)

Here is the first paragraph from the web-page I shared:

The ability to analyze a firmware image and extract data from it is extremely useful. It can allow you to analyze an embedded device for bugs, vulnerabilities, or GPL violations without ever having access to the device.

If you never have access to the device (as mentioned in that paragraph), why would you want to edit the firmware image?

I have personally had the pleasure to use binwalk to discover a vulnerability as well as a GPL violation (reported on June 14th, 2012). The vulnerability allowed me to gain root access to an otherwise completely locked-out operating system, without changing any firmware image. This root access allowed me to implement a secure "back door" as well as useful features that the product lacked, but which my employer (at the time) would benefit from.

If you don't agree with the three uses given in that quoted paragraph, then here's another one: Finding out where things are going to be. WvBootDD needs to know where NTLdr (or rather, the OsLoader.Exe inside) is going to have certain items in memory. Icecube very generously automated most of this discovery process using binwalk and a specially-crafted "magic" file.
  • Icecube likes this

#14 MedEvil

MedEvil

    Platinum Member

  • .script developer
  • 7771 posts

Posted 20 November 2012 - 05:32 PM

Who cares about GPL violations? Are you a lawyer or work at least for one?

Also what good is looking for a bug at the bottom most level, if one can't fix it?
I don't go through the trouble of disassembling an exe, to find a bug, as long as someone else has still the source code and is fixing reported bugs.
A.) It's a waste of my time and B.) It's completely useless information for someone, who works at the source code level.

How do you implement anything without write access?

:cheers:

#15 Wonko the Sane

Wonko the Sane

    The Finder

  • Advanced user
  • 16066 posts
  • Location:The Outside of the Asylum (gate is closed)
  •  
    Italy

Posted 20 November 2012 - 05:42 PM

Who cares about GPL violations? Are you a lawyer or work at least for one?

Also what good is looking for a bug at the bottom most level, if one can't fix it?
I don't go through the trouble of disassembling an exe, to find a bug, as long as someone else has still the source code and is fixing reported bugs.
A.) It's a waste of my time and B.) It's completely useless information for someone, who works at the source code level.

How do you implement anything without write access?

:cheers:

Good :), your absolutely irrelevant comments have been posted and a note has been taken of them.

Now why don't you go writing something (since reading only is not complying with your standards or preferred activities)? :unsure: :dubbio:

:cheers:
Wonko
  • Brito likes this

#16 MedEvil

MedEvil

    Platinum Member

  • .script developer
  • 7771 posts

Posted 20 November 2012 - 08:30 PM

What the heck are you talking about? And why?

I asked for a real world example, where this program is useful.

As answers i get:
- Check for GPL violations. Since non of us is a lawyer this would be pretty pointless.
- Find a bug at the machine code level, even though this info is completely useless to someone working on the source code level.

And last but not least this:

If you never have access to the device (as mentioned in that paragraph), why would you want to edit the firmware image?

WvBootDD needs to know where NTLdr(or rather, the OsLoader.Exe inside) is going to have certain items in memory.

What for would this be needed, if we don't have access to the device? Or do we have all of a sudden access to it? :confused1:

If you find the practical relevance of the above examples, through your thorough reading :worship:, enlighten me!

:cheers:

#17 Sha0

Sha0

    WinVBlock Dev

  • Developer
  • 1682 posts
  • Location:reboot.pro Forums
  • Interests:Booting
  •  
    Canada

Posted 21 November 2012 - 01:00 AM

Who cares about GPL violations? Are you a lawyer or work at least for one?

As mentioned, I reported the GPL violation. I did this because I want the GPL-licensed source code. The operating system was Linux along with many familiar utilities, but some of them had been modified. I wanted to read the source code of the modifications so I could gain insight into specific situations.

Also what good is looking for a bug at the bottom most level, if one can't fix it?

As mentioned, discovering a bug was not one of the uses that I personally had. I discovered a vulnerability which resulted in huge benefits. I was able to save large amounts of time and automate the capture of security event data. If, instead, I had submitted feature requests, they would have to go through whatever review and development processes the vendor has, and might even be rejected.

I don't go through the trouble of disassembling an exe, to find a bug, as long as someone else has still the source code and is fixing reported bugs.

Although I didn't use binwalk to find bugs, the vendor of a particular product I remember was a bit slow and sometimes unsure about fixing the bugs. If they hadn't been in violation of the GPL (discovered with binwalk), perhaps I could've read the source code and fixed the bugs myself!

How do you implement anything without write access?

After using binwalk to extract the root filesystem, I was able to examine the startup scripts and I found a vulnerability because those startup scripts examined certain NVRAM variables, which I could control on the device. I didn't need to modify any firmware, digital signatures, checksums, etc. I just needed to set magic NVRAM variables that would exercise the vulnerability. Pretty simple. I was able to develop the "back door" without the device. That means that you could've asked me to develop it for you and you wouldn't've needed to ship me the unit.

As answers i get:
- Check for GPL violations. Since non of us is a lawyer this would be pretty pointless.
- Find a bug at the machine code level, even though this info is completely useless to someone working on the source code level.

You've just listed two answers. I quoted three uses, in the last post. Why is one missing?

Also, I disagree with your opinion about the GPL and lawyers, which is why I gave a link to the GPL Violations web-site. Feel free to actually go to the web-site and read the material in order to try to gain an understanding of why someone might disagree with your opinion about needing to be or work for a lawyer.

And last but not least this:


If you never have access to the device (as mentioned in that paragraph), why would you want to edit the firmware image?

I didn't understand why you would read the first paragraph of the given web-page and then ask about editing the firmware image, when the first paragraph clearly states a usefulness without any access to the device. How do those two items go together?


WvBootDD needs to know where NTLdr (or rather, the OsLoader.Exe inside) is going to have certain items in memory.

What for would this be needed, if we don't have access to the device? Or do we have all of a sudden access to it? :confused1:

I think there are two problems, here.

Firstly, you dropped one of the three uses described in the first paragraph of the given web-page. The subject you dropped was probably the most useful I've personally experienced, so I think it's certainly worth noting.

Secondly, you cut pieces of my response and pasted them together out of context, so you appear to have the mistaken impression that they are related. If you read again, I think you'll find that I gave a fourth use which is my own and unrelated to the previous three, which were from the first paragraph of the given web-page. Since it is unrelated, then any discussion about "device access" is irrelevant.

Let me try a fifth example. This one is unrelated to the previous four: You have an image of a filesystem, but all of the filesystem meta-data has been badly damaged. You can scan the filesystem image and discover the locations of and extract the data for any blobs of data which are recognized by binwalk... Think about text documents, pictures, videos, programs, archives, etc. This scenario can arise and has happened to me due to accidentally having two computers access the same disk filesystem at the same time. Is that a useful example, in your opinion?

#18 Icecube

Icecube

    Gold Member

  • Team Reboot
  • 1063 posts
  •  
    Belgium

Posted 21 November 2012 - 01:39 AM

New version of TrID2magic attached:
  • Python source code to convert TrID xml files to libmagic format is included
  • Shortened filetype names that where too long
  • Treat single quote in the string of a "<String>" key as a null byte (x00) ==> it is possible that the single quote represents all non-printable characters instead of a null byte.
  • To prevent that the "Autodesk FLIC Image File" signature is found as first in e.g. when scanning a ZIP file, give it zero strength.


    17 string x00x00x00x00x00
    
    >7 string x00
    
    >>15 string x00 Autodesk FLIC Image File (extensions: flc, fli, cel) (CEL)
    
    !:strength * 0
    Strength documentation:
    
    An optional strength can be supplied on a separate line which refers to the current magic description using the following format:
    
    	 !:strength OP VALUE
    
    The operand OP can be: +, -, *, or / and VALUE is a constant between 0 and 255.
    
    This constant is applied using the specified operand to the currently computed default magic strength.
    
    
Magic file format documentation: http://linux.die.net/man/5/magic

Attached Files



#19 Icecube

Icecube

    Gold Member

  • Team Reboot
  • 1063 posts
  •  
    Belgium

Posted 21 November 2012 - 02:16 AM

That's exactly the point, i don't get.
It is a tool to extract files from an embedded image. How is that useful, when it can't create an embedded image with the changed data?

Editing an image is useful. Just reading it, not so much. ;)

:cheers:

To be able to write in a certain format, you need to know the exact format that is used. It is impossible to write a tool that can handle each image automatically without knowing how ithe image was build or how the part you want to replace plays together with other parts of the binary (e.g. after replacing a lzma stream, you might need to fix a checksum at another place in the binary.

A sixth example of what would have been a good use case for binwalk:
Finding and extracing embedded floppy images (flat, zipped, bzipped, gzipped) in floppy creators so they can be used by MEMDISK/grub4dos:

Hello. I installed ImDisk in Vista Home Basic 32bits.
Copying files to the virtual floppy using Windows Explorer works.
But I experienced some problems using other programs that want to write to the virtual floppy.

For example, tools based on Ontrack Floppy Creator, such as: IBM Disk manager 9.57 Ontrack Advisor 5.00 Trial Ontrack Disk Manager 10.46 (2 floppy disks) Samsung Disk Manager 10.42 Seagate Disk Manager 9.56a Seagate Disk Wizard Standard Edition 10.45.06 (2 floppy disks) Western Digital DataLifeGuard 5.04f

Such tools are created by floppy creators, and need a 1440K 3.5" A: formatted floppy under Windows to create the actual floppy tool. Some of them may need changing their properties (compatibility with XP), some may not.

In that topic, Wonko and I used gsar to find unique bytes to identify the kind of Floppy creator and to identify the offset of the (eventually compressed) image (last page contains final BAT file):
http://reboot.pro/to...virtual-floppy/

#20 Wonko the Sane

Wonko the Sane

    The Finder

  • Advanced user
  • 16066 posts
  • Location:The Outside of the Asylum (gate is closed)
  •  
    Italy

Posted 21 November 2012 - 10:31 AM

I will add - more generally - that binwalk can be used as a "carving" tool on unknown/unidentified chunks of data (practical example: chunk of sectors recovered from a failing/partially failed hard disk).
Some info and commonly used tools for carving in Data Recovery:
http://www.forensics...ecovery#Carving

:cheers:
Wonko

#21 Brito

Brito

    Platinum Member

  • .script developer
  • 10616 posts
  • Location:boot.wim
  • Interests:I'm just a quiet simple person with a very quiet simple life living one day at a time..
  •  
    European Union

Posted 21 November 2012 - 10:44 AM

- Check for GPL violations. Since non of us is a lawyer this would be pretty pointless.

My friend, I'm certainly not a lawyer and yet assuring legal compliance with GPL terms is part of my daily job.

You hire a contractor company to deliver you something and they sign a contract assuring that you'll get all intellectual property rights, however on our side we still need to like Reagan once said: "Trust, but verify".

A typical lawyer won't know how to verify this compliance, this is a task for technically savvy people.

#22 MedEvil

MedEvil

    Platinum Member

  • .script developer
  • 7771 posts

Posted 21 November 2012 - 04:25 PM

@Sha0
Thanks for the reply, i understand you now better.
You reported the GPL violation, so that someone else hires a lawyer to force the company to release the source code, so you could then read it.

Maybe it's that i'm too old to spend so much time, waiting on a solution, or simply the fact, that in more than 90% of all cases, i don't have the chance of a snowball in hell to ever see the sourcecode, that i always go straight to disassembling and patching the binary file.

@Icecube + Wonko
Thanks for the examples.

@Nuno
No, you don't need to be a lawyer to check for violations, but you need one, to make use of those findings!

Why are you checking for GPL violations in the binary?
When my company hired an outside source, the sourcecode was always part of the deal.

:cheers:

#23 Wonko the Sane

Wonko the Sane

    The Finder

  • Advanced user
  • 16066 posts
  • Location:The Outside of the Asylum (gate is closed)
  •  
    Italy

Posted 21 November 2012 - 05:39 PM

Cannot say exactly HOW to use the attached :ph34r: but it consists of:
  • a half-@§§ed batch to extract some data fields from the set of TrId .xml's into a TAB delimited .txt
  • xmlparser.exe
  • the resulting TAB delimited.txt made from current triddefs_xml.rar that can be imported in *any* spreadsheet
  • the file as .xls (just in case you have troubles with the senseless "I am smarter than you and than your data when I import" approach that a number of spreadsheet apps adopt)
It is easy to add a "priority number" in (say) the empty TrId field (and /or group files by "source") and re-order the filenames to feed them in the "right" order to Icecube's nice libmagic converter, but WHAT :w00t: could be the exact classifying/attributing priorities criteria? :unsure:

:cheers:
Wonko

Attached Files



#24 Icecube

Icecube

    Gold Member

  • Team Reboot
  • 1063 posts
  •  
    Belgium

Posted 02 January 2013 - 03:28 PM

Uploaded Binwalk Version 0.5.0.

 

I also added a (very basic) binwalk.cmd version so you don't need to specify the "-m magic.binwalk" option anymore.

The current batch has the disadvantage that you need to specify the absolute path to the file you want to investigate (or relative to the path where you etracted this 7-zip archive).

Also if you kill binwalk.exe with Ctrl+C, your current path will change to the path where you etracted this 7-zip archive.

So feel free to improve it.

@echo off
rem Wrapper batch for binwalk.exe to make it possible for binwalk to find
rem magic.binarch, magic.bincast, magic.binwalk and extract.conf in the same
rem directory than binwalk.exe.

pushd %CD%
cd %~dp0

break

start /B /W "BinWalk" %~dp0\binwalk.exe %*
popd

 


  • Brito likes this




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users