The herdProtect fuzzy engine is an experimental anti-malware scanner that utilizes static and dynamic fuzzy matching to find new malware variants or those that are not detected by some anti-virus emgines. By generating a set of fuzzy signatures based on the file structure and active behaviors and comparing those signatures with known malware variants, the engine can perform fuzzy matching quickly and efficiently to discover new files in the wild based on existing ones.
One such fuzzy hash the engine utilizes is a context triggered piecewise hash (CTPH), also known as Ssdeep, which is designed to find nearly identical file conetent based soley on the sturcture of a file on disk. Homologous ﬁles share identical sets of bits in the same order. Because such ﬁles are not completely identical, traditional techniques such as cryptographic hashing cannot be used to identify them. CTP hashing is a technique for constructing hash signatures by combining a number of traditional hashes whose boundaries are determined by the context of the input. These signatures can be used to identify modiﬁed versions of known ﬁles even if data has been inserted, modiﬁed, or deleted in the new ﬁles.
How does this work?
An example of calculating such a fuzzy match is as such. Two known adware type files (both web browser extensions) both have completely different cryptographic hashes, however, comparing their CTPH hashes yeilds a 97% match ratio bases soley on the structure of the files. In addition, the fuzzy match takes into account the dynamic beahviors of files such as both being BHOs (Browser Helper Objects) that utilize the same CLSIDs, both make simular network connections, etc.
SHA-1 match: 0%
CTPH match: 97%