A02:2021 - Cryptographic failures - Colisiones Hash

October 21, 2024 Olga Borrallo

What are cryptographic flaws?

A cryptographic failure occurs when data protection is not adequate, regardless of whether the failure occurs due to incorrect use of a cryptographic algorithm, a failure in its implementation, or the use of insecure practices. The vulnerabilities that occur can affect the confidentiality and integrity of an application's information.

Cryptographic flaws are a security risk since they allow a potential malicious actor to decrypt, manipulate or access information that, by design, they should not have access to.

Modern cryptography can be understood as the use of robust algorithms, consequently considered secure. Among them are AES; RSA or SHA-256, despite this, its correct implementation is crucial. Incorrect key generation, the use of obsolete algorithms, such as SHA1 or MD5; or putting into production predictable number generators, can result in attacks against the application's cryptography.

Use of obsolete algorithms

One of the most common cryptographic flaws is the use of obsolete hash functions and encryption algorithms, previously referenced, SHA1 and MD5. Although they were standard in their time (1995 and 1992, respectively), they are not considered secure today due to advances in attack techniques, such as hash collision searching. These attacks allow a potential malicious actor to generate two different inputs that produce the same hash, which could result in information theft.

Symmetric encryption algorithms must also be taken into account, among which there are also some considered obsolete, DES or 3DES, which do not provide adequate robustness for the times in which we find ourselves, given that over time and resources enough the data can be decrypted.

History: MD5 and SHA1

The MD5 (Message Digest 5) and SHA1 (Secure Hash Algorithm 1) algorithms are hash functions that have been widely used to verify data integrity. However, over time, high-impact vulnerabilities were discovered in both algorithms, allowing malicious actors to carry out collision attacks (among others), resulting in a security breach. As a result, both MD5 and SHA1 are considered obsolete and are not recommended for use in functionality that requires high security.

MD5 - Message Digest 5

Algorithm designed in 1991, generates a 128-bit hash from a data set of any size. Originally used to verify the integrity of files and images, and password hashes.

Collision attacks: in 2004 researchers Xiaoyun Wang, Dengguo Feng, Xuejia Lai and Hongbo Yu; showed that MD5 is vulnerable to collisions.

Xie, T., Liu, F., & Feng, D. (2013). Fast collision attack on MD5. Cryptology ePrint Archive.

Preimage attacks: Although less common, preimage attacks are theoretically possible in MD5, where a malicious actor could attempt to reconstruct the original message from the hash.

Sasaki, Y., & Aoki, K. (2009). Finding preimages in full MD5 faster than exhaustive search. In Advances in Cryptology-EUROCRYPT 2009: 28th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Cologne, Germany, April 26-30, 2009. Proceedings 28 (pp. 134-152). Springer Berlin Heidelberg.

SHA1 - Secure Hash Algorithm 1

SHA1, developed by the NSA in 1993, generates a 160-bit hash. Its use was standardized, found in protocols such as SSL/TLS and PGP. In recent years it has been shown to be vulnerable to collision attacks.

In 2017, Google, in association with the University of Amsterdam, demonstrated that collisions could be generated in SHA1, publishing https://shattered.io/ for free public consultation.

Hash Collisions

Hash Functions

A hash function is an algorithm that takes a variable length input and converts it into a variable output (hash or digest). Hash functions are designed to be unidirectional, the input cannot be recovered from the output, and collision resistant. However, in certain algorithms it has already been demonstrated that resilience to collisions is not always perfect.

But what are hash collisions?

A hash collision occurs when two different inputs produce the same output or digest. This situation dismantles one of the essential properties of hash functions, the uniqueness of the digest.

The security of systems such as digital signatures, file integrity, and SSL/TLS certificates can be compromised through hash collisions. If a malicious actor manages to modify the content of a message without altering its hash, they can ensure that the modifications go unnoticed and the message is considered legitimate.

MD5 collision example

For the collision example, two text strings are used, which differ by one byte. Credit to Marc Stevens for his post .

Chain1:

TEXTCOLLBYfGiJUETHQ4hAcKSMd5zYpgqf1YRDhkmxHkhPWptrkoyz28wnI9V0aHeAuaKnak

Chain 2:

TEXTCOLLBYfGiJUETHQ4hEcKSMd5zYpgqf1YRDhkmxHkhPWptrkoyz28wnI9V0aHeAuaKnak

TEXTCOLLBYfGiJUETHQ4hAcKSMd5zYpgqf1YRDhkmxHkhPWptrkoyz28wnI9V0aHeAuaKnak

⬆

|

⬇

TEXTCOLLBYfGiJUETHQ4hEcKSMd5zYpgqf1YRDhkmxHkhPWptrkoyz28wnI9V0aHeAuaKnak

In order to demonstrate this, a small proof of concept will be carried out in C:

#include

#include

#include

int main() {

    char *input1 = "TEXTCOLLBYfGiJUETHQ4hAcKSMd5zYpgqf1YRDhkmxHkhPWptrkoyz28wnI9V0aHeAuaKnak";

    char *input2 = "TEXTCOLLBYfGiJUETHQ4hEcKSMd5zYpgqf1YRDhkmxHkhPWptrkoyz28wnI9V0aHeAuaKnak";

    unsigned char hash1[MD5_DIGEST_LENGTH];

    unsigned char hash2[MD5_DIGEST_LENGTH];

    MD5((unsigned char*)input1, strlen(input1), hash1);

    MD5((unsigned char*)input2, strlen(input2), hash2);

    for(int i = 0; i

    printf("\n");

    for(int i = 0; i

    printf("\n");

    printf(memcmp(hash1, hash2, MD5_DIGEST_LENGTH) == 0 ? "Collision found!\n" : "No collision\n");

    return 0;

And you get a hash collision, two different alphanumeric strings result in the same MD5 hash. This, taken to an access system in which the credentials are stored using MD5 hashes, could lead to unauthorized access with a credential other than the original one.

How do you find hash collisions? → Chosen Prefix Collision

Identifying a hash collision is like finding a needle in a haystack, the amount of computational resources required to find two (for example) arbitrary strings which result in the same hash is enormous.

For this reason, techniques such as Chosen Prefix Collision (CPC) have been developed, where two messages with specific prefixes can be selected and variations created that result in a hash collision.

Theoretical framework

The goal of the attack is to find two messages M1 and M2, each with known and controlled prefixes P1 and P2 , resulting in the same digest. It can be expressed as:

Where:

H is the hash function (MD5 or SHA1).
P1 and P2 are the chosen prefixes.
M1 and M2 are the variable parts to identify.
∣∣ denotes concatenation.

The risk posed by this particular attack is high given that the prefixes can be part of a file or message with a known structure (such as a digital certificate), which could allow data to be manipulated strategically in order to achieve the collision without that visible or critical elements be modified.

Process of the Attack

Considering the structure of a hash function that is divided into blocks:

Where My represents the fixed size blocks in the input. The calculation of the hash function, H, is performed iteratively on these blocks, where each block modifies an intermediate state Si.

During the CPC attack the prefixes P1 and P2 will be processed first, generating an intermediate state for the hash calculation.

To achieve the objective of finding two blocks M1 and M2, which, concatenated with the prefixes P1 and P2 produce the same final hash value, blocks M1 and M2 must be adjusted so that the differences between S1 and S2 i> are canceled in subsequent intermediate states.

At least one My block must compensate for the difference between the intermediate states.

Where:

⊕ represents the XOR operation.
ΔS is the difference introduced in the intermediate states to balance the disparity between S1 and S2.

Simplifying, if it is assumed that a collision of intermediate states can be forced, at the end of the chosen blocks we would obtain:

Resulting in the adjustment of M1 and M2 so that, despite having different prefixes P1 and P2, the same final intermediate state S1 occurs.

Achieving equality involves introducing calculated differences in the message blocks M1 and M2, applying differences in the bits of the message block that counteract the differences in the intermediate states of the hash function. These differences will be calculated in such a way that their propagation through the hash function leads to the same final state.

Assuming differences are introduced in message block M1, such as ΔM such that:

With the objective that the intermediate state after applying the block M1⊕ΔM is equal to the intermediate state after applying the block M2.

Conclusion

The Chosen Prefix Collision attack exploits the interactive nature of hash functions. By manipulating blocks of data and applying calculated differences, intermediate states can be made to match, causing the final hash output to collide.

This attack is a test of the fundamental weaknesses of hash functions such as MD5 and SHA1, primarily against advanced attacks that manipulate input data in a controlled manner.

Author: Daniel Puente

return to blog