Event injection in serverless architectures

Introduction

The rise of Infrastructure as a Service (IaaS) has opened up a new horizon of possibilities for computing in general, creating new paradigms in terms of application development. Among these, serverless architecture stands out, making it possible to build applications without the need to manage a physical infrastructure.

However, since it’s an emerging technology, the lack of experience of developers with the paradigm, coupled with common inconsistencies in the hardening of cloud-based environments, creates a fertile ground for bugs and security problems. Within these, one of the most relevant, according to the OWASP serverless top 10 project (2017), is called event data injection and will be the subject of this study.

Understanding serverless architecture

In simple terms, serverless architectures are nothing more than applications built without the developer having to depend on an explicit server. Of course, in order for the application’s logic to be interpreted, there must be a physical server. The great thing about this structure is that the server is maintained and managed by the cloud provider and delivered as a service to the application developer.

In this way, the construction of the application is based on services, abstracting the components of a conventional web infrastructure – such as servers, databases, gateways etc. – into a service managed and maintained by the cloud provider.

2.1. A brief example

The following image illustrates the basic architecture of a web application that informs the user of the weather forecast:

Note that, in this scenario, each block represents a service that the provider makes available. These blocks are independent of each other, but they can communicate to build a complete infrastructure for the application.

To perform this communication, the services trigger events between themselves, which contain a representation of the data generated and which can be interpreted by another component.

Unraveling Functions

As you may have noticed, the previous scenario doesn’t feature any kind of web server responsible for interpreting the user’s input and communicating with the database. That’s because the application uses an abstraction of this component, popularly called a function. Functions are an important point for our study because they are where the “logic” part of the application takes place.

Although similar to a web server, functions differ from these components in that they are only executed when an event – with a previously defined format – occurs.

When creating a function, two main components must be defined by the developer:

What type of event triggers the function? This information is used to trigger the execution of the code whenever new data is received.
Which code will be executed? This is the functional part of the component, in this field the developer enters the code that will be executed when the function is triggered.

Once created, internally – and not transparent to the developer – a runtime is established for the function which waits until a trigger event arrives. When this event occurs, a container is created and the function code is executed inside it, in isolation.

Thus, a function is nothing more than a runtime that creates ephemeral containers that execute a block of code every time a certain type of data is received.

As this entire structure is managed by the cloud provider, certain security measures are taken to protect its internal infrastructure. Listed below are some of the most interesting for this study:

The container’s operating system is usually Linux-based and utilities capable of making external requests are generally removed from the environment.
All the contents of the container are ReadOnly except for the /tmp directory.
The container is not exposed to the Internet and there is no data persistence.
The application code is stored in the /var/task directory
The environment variables can contain access keys for the provider and services that the function will communicate with.

A new perspective on injection attacks

Now that we understand the structure of this type of architecture, let’s analyze the attack surface it provides for a possible attacker and how injection attacks fit into this environment.

When we look at conventional web applications, the data input path is linear and well-defined, meaning that we know exactly where the data comes from, which path it takes and where it goes.

In serverless applications, on the other hand, the input data is encapsulated in events that will be consumed by the functions. These events can originate from numerous sources and in many different formats, making it difficult to use traditional protection mechanisms such as Web Application Firewalls (WAFs) and similar structures.

As a result, the attacker no longer has one entry point, but a multitude of them. In fact, any event-generating source that interacts directly with the function is potentially vulnerable.

In this way, user input is much closer to the application code and considerably expands the attack surface compared to a conventional application. This difference can be seen in the comparison illustrated by the following image:

Source: https://owasp.org/www-pdf-archive//OWASP_DC_SLS_Top10.pdf

Another important factor, which can have a major impact on security, is that events don’t always follow a linear path between the user and the function. Most of the time, there will be intermediary services between the user’s input and the data destination.

Even if the user’s input doesn’t seem dangerous at first, when processing the data provided, an apparently secure service can generate contaminated data and lead to possible new vulnerabilities.

In this way, any data that comes into contact with user input is potentially contaminated data. This means that if serverless applications are not well built and implemented, they become a fertile environment for attackers.

With the increased attack perimeter and the developer’s over-reliance on user input, by manipulating the generation of events, an attacker may be able to provoke adverse reactions in the system and, consequently, manipulate its behavior.

This allows a malicious agent to carry out well-known attacks on conventional web applications, but from the perspective of a serverless architecture, usually consisting of exploiting one or more of the following vulnerabilities:

– Injection of SQL and NoSQL commands;

– Inclusion of local and remote files (LFI and RFI);

– Execution of code in the operating system;

– Performing requests on behalf of the server (SSRF);

– Execution of HTML/Javascript code (XSS);

– Among others.

Attack scenarios

In order to gain an empirical understanding of how this class of vulnerabilities works, two serverless applications were built on purpose.

The first is a service for storing PDF files with size decompression. The application converts the input PDF files into text files and stores them in a bucket in the cloud. In this case, the injection takes place by uploading the files transparently to the attacker.

The second scenario involves a conventional web application where files are also stored in a bucket in the cloud. However, in this case the injection happens blindly to the attacker through a backup of the files programmed through another cloud provider service. We’ll get into more detail at the end of this article.

5.1. Case 1 – Injection via file upload

For this case, the attacker’s input is a malicious PDF file that will be sent to the application’s bucket and, consequently, processed by a function in order to be converted into a text file.

In this scenario, the developer didn’t take care to correctly sanitize the names of the files sent by the application’s users and, as a result, left a loophole for malicious data to be sent in this field.

Note that, in the function’s code, the user’s input – the file name – is passed directly to the command line which calls a binary responsible for converting the document into text.

This way, an attacker – by deducing the use of a serverless architecture – would be able to create a payload and inject commands into the operating system.

To validate this possibility, let’s analyze each part of the possible payload below:

Firstly, we escape the pdftotext binary using the “;” character, making it possible to execute a chain of commands;
Next, we retrieve the environment variables of the function’s host container via the env command. Note that the output of this command is transmitted to the next command using the “|” conductor.
Finally, we make an external call via the function’s runtime, transmitting the data to the attacker’s server through the X parameter.

So, if you send a file whose name is exactly the same as the payload above, when you execute the function, the application must necessarily make an external call and send the environment variables to the attacker’s server.

Note, in the image below, when the information is received on the external server after the file has been sent:

When decoding the information obtained, it’s possible to see the presence of the access keys used by the function. From this information, if the cloud environment has not been correctly configured, it would be possible for the attacker to look for new vulnerabilities in order to escalate privileges and take complete control of this environment.

5.1.1. Injection through chained files

The above scenario can be useful for executing a small chain of commands, however, when a considerable sequence of commands is required, problems can arise related to the character limit in the name of the file.

To get around this problem, we can chain two files together to perform more complex attacks. In this case, one of the files is responsible for storing the malicious Python code, while the other file is responsible for acting as a trigger, retrieving the code from the first file and transmitting it to the function’s runtime.

In order to extract the source code, let’s take the previous application as an example. Initially, we can create a file called “payload.pdf” with the following content and submit it to the application:

Note that the content of the file above is code responsible for listing the contents of all the python files in the /var/task directory and sending them to the attacker’s server. However, sending just this file won’t cause any behavior in the application, we’ll need to send a second file with a payload in its name, which will be responsible for obtaining the contents of the “payload.pdf” file and sending it to the function’s runtime.

As such, the content defined in the name should follow the pattern described below:

When sending this data, the application is expected to execute the contents of the “payload.pdf” file and send the attacker’s server the information from all the .py files stored in the function’s default directory. As shown below, it was possible to obtain this base64-encrypted material using the “q” parameter:

After exfiltrating the content obtained and decrypting it, it’s possible to successfully read all the application’s source code:

5.2 – Case 2 – Blind event injection

As the serverless environment is so vast and numerous sources can be associated with a single function, event injection can occur from sending SMS to traditional HTTP requests. As long as the inputs are not sanitized correctly, the attacker has a world of possibilities. The problem is that it’s not always so simple to identify these scenarios; each application has its own specificities and unique scenarios.

For example, imagine an application that stores files in an S3 bucket, similar to applications such as dropbox and google drive. The storage process is done securely, with access control, logs etc. In conjunction with this process, there’s an internal routine triggered weekly to back up the stored files and save them in a glacial bucket.

In this scenario, the backup process (dotted area) requires no direct user interaction and takes place completely autonomously via a programmed event generator (similar to a crontab).

During this process, if the function performs any kind of processing of the user’s data, either to read the contents of the files or to use the names as indexes for some routine, the function will be potentially vulnerable, even if this structure is not accessible to the end user.

The main difference will be that the execution of the code will not be immediate, as it will only occur when the routine is executed, causing the payload to remain “dormant” for an arbitrary period of time.

Post-exploitation and completion

As shown above, it’s possible for an attacker to obtain sensitive data about the application, such as the source code or information linked to it. However, since this is a class of vulnerabilities inherent to cloud-based architectures, a range of possibilities is open. For example:

In cases of incorrect permissioning configuration, it’s possible to gain access to the cloud console through the credentials obtained;
Given that a persistent runtime must exist to create the containers that execute the function code, it’s possible to create a persistent connection (backdoor) in this context – since the injected code is executed in it – and intercept all the event data that is received;
By gaining access to poorly configured environments, services can be created and, as a result, a “ghost” application can be hosted for various purposes;
In order to obtain persistence in the environment, it may also be possible to create users without the cloud administrator noticing and take control of the environment;
From permissive policies, it may also be possible to escalate privileges, exploit vulnerable services, exfiltrate user data with weak passwords and move laterally within the internal infrastructure.

However, it’s important to emphasize that these are just a few possibilities; each case will depend on how the environment was built and which permissions are assigned to it.

Mitigation

As this article has shown, the origin of injection attacks comes mainly from excessive trust in user input or in a particular service, so one way to protect yourself is to adopt a zero trust architecture.

This implies never trusting the input received and always verifying it before any processing is done, as well as sanitizing the data without any kind of assumption as to its origin or veracity.

It’s also recommended to adopt a secure development policy in order to mitigate the maximum number of possible vulnerabilities and bugs during the development process, as well as adopting the use of static code analysis tools, whether they are integrated into the cloud environment or softwares that are part of the system. These tools are extremely important for checking points that may go unnoticed, but it’s important to highlight that they are not a silver bullet and do not remove the need for the human factor when analyzing code.

It’s worth pointing out that it’s extremely important to use hardening techniques in cloud environments, making sure to isolate and correctly classify users and access groups. It’s also important to ensure that the services used employ proper access control and that their credentials are stored securely and inaccessible to unauthorized people.

Finally, despite the inefficiency of WAFs (Web Application Firewalls) in certain types of communications, in cases where these mechanisms are efficient – such as HTTP requests – it is strongly recommended to consider their use, and they can be placed between the client and the gateway in order to protect specific points in the application.

References

AWS Lambda Function. Available at: <https://aws.amazon.com/lambda/>

AWS Lambda command injection. Available at: <https://www.safe.security/resources/blog/aws-lambda-command-injection>

OWASP Serverless Top 10. Available at: <https://owasp.org/www-project-serverless-top-10/>

Hacking AWS Lambda for security fun and profit. Available at:<https://blog.appsecco.com/hacking-aws-lambda-for-security-fun-and-profit-c140426b6167>

Serverless as a service top 10. Available at: <https://github.com/puresec/sas-top-10>