Jump to content











Photo

Fingerprints for functions inside binary executables?


  • Please log in to reply
10 replies to this topic

#1 Brito

Brito

    Platinum Member

  • .script developer
  • 10616 posts
  • Location:boot.wim
  • Interests:I'm just a quiet simple person with a very quiet simple life living one day at a time..
  •  
    European Union

Posted 02 January 2017 - 04:45 PM

Hello,

 

I'm looking into ways of creating a fingerprint of an executable file (for e.g. Windows, but could also be Linux/Mac) based on its internal functions.

 

What do you think it would be a good strategy to make this possible?

 

Have some ideas:

1) extract function names (not sure if available)

2) get the code for the function, generate similarity hash of the content

3) other methods?

 

Right now option 2) seems to be on the spot for this task (assuming a plain executable without UPX or anything of the sort). After extracting the assembler code, it should be relatively straightforward to compare different functions to see if they are similar as done already for other binary comparisons.

 

Would you know any library (non-copyleft license) and hopefully running on Java that could extract the assembler snippet for each function or a better approach to this goal?

 

My thanks and happy 2017 ahead!

 

:cheers:



#2 erwan.l

erwan.l

    Platinum Member

  • Developer
  • 3042 posts
  • Location:Nantes - France
  •  
    France

Posted 02 January 2017 - 06:37 PM

Hi Nuno,

 

I would say it depends on which platform your target code (the one you want to generate/check fingerprints on) has been written.

I see two big families : the interpreted ones and the native ones.

 

On the native ones, it is normally rather easy to disassemble it down to windows api calls : you could then fingerprint windows api's.

 

On interpreted ones, my understanding is that it always relies on a runtime/framework (whatever word each vendor use) and you will have quite a few cases with however 2 outstanding ones : java and dotnet.

 

And I am only talking "windows" as I dont know much of the others although I expect it works the same.

 

I dont know any library doing so but would be interested to hear of any.

 

My 2 cents :)

Regards,

Erwan


  • Brito likes this

#3 joakim

joakim

    Silver Member

  • Team Reboot
  • 912 posts
  • Location:Bergen
  •  
    Norway

Posted 03 January 2017 - 11:17 PM

I am sure this is not exactly what you are after, but it's still an interesting approach and not too much OT;

http://computer.fore...-3-vendors.html



#4 Brito

Brito

    Platinum Member

  • .script developer
  • 10616 posts
  • Location:boot.wim
  • Interests:I'm just a quiet simple person with a very quiet simple life living one day at a time..
  •  
    European Union

Posted 04 January 2017 - 04:55 PM

Thanks for the feedback.

 

I'm still looking for the simplest solution possible... :lol:

 

From what I understand: both Java and .NET produce their own binary format, similar to the assembler of native executables. So in theory we can round them all up as different binary structures but all the same files that we can write a parser.

 

Wouldn't be needed to get the function name nor understand what is inside each function, nor what is being called (internal/external dependencies). For basic fingerprint purposes, just need to know where each function starts and ends.

 

For example:

function1
elwjdworu239r320
230432048230949
233209432049889
239048320948909
function2
324324902390993
087847938493409
809812623882922
function3
....

I'm still not sure if this is possible on assembler level without a decompiler like IDA that generates those nice graphs. Still, if they are doing it then there must be some (hopefully feasible) way. Found this one for Windows executables: https://github.com/guelfoweb/peframe

 

And some code to extract data from Linux/Windows: https://github.com/o...assniki/one-elf

 

There is this documentation to followup: https://msdn.microso...y/ms809762.aspx

 

But still, I wonder if something else is already available that gets the binary content for each function/section on these binaries.. :)

 

 

 



#5 sbaeder

sbaeder

    Gold Member

  • .script developer
  • 1338 posts
  • Location:usa - massachusettes
  •  
    United States

Posted 07 January 2017 - 06:21 AM

Wouldn't be needed to get the function name nor understand what is inside each function, nor what is being called (internal/external dependencies). For basic fingerprint purposes, just need to know where each function starts and ends.

The issue may also be knowing the language and/or tool used, since it may not be so simple with various entry points, or early exits, etc.  A lot could depend on the compiler and optimizations being employed.  A linker might get you the entry points, and often if debug is compiled in, this may be available.

 

So, given that this is "hard", any hints as to WHY you want this (or the "finger print" of a function?  Maybe other ways to accomplish the end goal in a different (out of the box/) way???

 

.p.s. Happy new year, and hope thigns are going good for you!



#6 Brito

Brito

    Platinum Member

  • .script developer
  • 10616 posts
  • Location:boot.wim
  • Interests:I'm just a quiet simple person with a very quiet simple life living one day at a time..
  •  
    European Union

Posted 07 January 2017 - 08:59 AM

Happy new year! :thumbup:

 

any hints as to WHY you want this (or the "finger print" of a function?  Maybe other ways to accomplish the end goal in a different (out of the box/) way?

 

I already have in a place a generic way to detect similar binary files but it is "dumb" about what is inside of them. For example it can detect if two source code files are similar (percentage-wise) but won't be able to answer: "snippet X is also found on file Y".

 

For that purpose we did a snippet comparison program and then adapt the parser to whatever programming language being compared, that worked good. Later, we've done similar approach for artwork and can now compare icons vs jpg vs png regardless of the binary format underneath: https://youtu.be/iT9ObETm-ic

 

What is still missing are executable files (.exe, dll, ...). The generic binary similarity match is available, but can only answer "X file is nn% similar to file Y". Would be great to one day being able to answer: "function X is also found on file Y".

 

When we have a Windows executable, how can we discover where each function starts/ends without 3rd party tools? :-)

 

:cheers:

 



#7 erwan.l

erwan.l

    Platinum Member

  • Developer
  • 3042 posts
  • Location:Nantes - France
  •  
    France

Posted 07 January 2017 - 11:45 AM

Well Microsoft (and I guess all anti virus software vendors) use a similar approach I believe.

 

Indeed, I sometimes use readmemory, writememory, createremotethread, etc windows api's in some of my developped softwares.

And MS AV is quick to detect it and delete my newly built executable.

And this whether I use early binding or late binding (using loadlibrary) or other tricks in my code to try to obsfucate it.

The only way I am able to avoid the AV to kick in was to :

-encrypt my PE with XOR (base64 was still being detected)

-run the PE in the memory of a harmless executable host

 

Long story short, in Windows/AV world, it seems possible to detect particular functions/calls.


  • Brito and joakim like this

#8 Holmes.Sherlock

Holmes.Sherlock

    Gold Member

  • Team Reboot
  • 1444 posts
  • Location:Santa Barbara, California
  •  
    United States

Posted 07 January 2017 - 09:05 PM

 

When we have a Windows executable, how can we discover where each function starts/ends without 3rd party tools? :-)

 

Though doesn't directly answer the issue at hand, this can be an interesting read.


  • Brito and joakim like this

#9 Brito

Brito

    Platinum Member

  • .script developer
  • 10616 posts
  • Location:boot.wim
  • Interests:I'm just a quiet simple person with a very quiet simple life living one day at a time..
  •  
    European Union

Posted 08 January 2017 - 07:03 PM

Long story short, in Windows/AV world, it seems possible to detect particular functions/calls.

 

I think they might be sniffing for the external calls. Android is doing the same in order to then list end-users what they need to agree in terms of permissions (albeit not permitting end-users to choose what to share, it's an all or nothing).

 

Though doesn't directly answer the issue at hand, this can be an interesting read.

 

Very good research, thank you for sharing. That approach seems ideal for a large enough pool of executables. Have they released any source code that we can reuse? :-)

 

:cheers:



#10 Holmes.Sherlock

Holmes.Sherlock

    Gold Member

  • Team Reboot
  • 1444 posts
  • Location:Santa Barbara, California
  •  
    United States

Posted 19 January 2017 - 10:01 AM

Worth looking at: https://github.com/M...-Plugin-IDA-Pro

 

Assembly code analysis is a time-consuming process. An effective and efficient assembly code clone search engine can greatly reduce the effort of this process, since it can identify the cloned parts that have been previously analyzed. Kam1n0 is a scalable system that supports assembly code clone search. It allows a user to first index a (large) collection of binaries, and then search for the code clones of a given target function or binary file.


#11 Brito

Brito

    Platinum Member

  • .script developer
  • 10616 posts
  • Location:boot.wim
  • Interests:I'm just a quiet simple person with a very quiet simple life living one day at a time..
  •  
    European Union

Posted 19 January 2017 - 10:58 AM

The obstacle is using IDA-pro, making difficult for other users to run the tool without an IDA-pro license.

There was a recommendation to look into this work: http://security.ece....edu/byteweight/

The VM is 4.3Gb but might be possible to reduce into something more usable.

:cheers:






1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users