Weaveworks 2022.03 release featuring Magalix PaC | Learn more
Balance innovation and agility with security and compliance
risks using a 3-step process across all cloud infrastructure.
Step up business agility without compromising
security or compliance
Everything you need to become a Kubernetes expert.
Always for free!
Everything you need to know about Magalix
culture and much more
The Function as a Service (FaaS) is a relatively new architectural pattern. It came into existence when major cloud providers like AWS started offering products like Lambda Functions, followed by Azure Functions (Microsoft Azure), and Google Cloud Functions (Google Cloud). The idea behind those products is that sometimes, you may not need a service in the “always-on” mode. Rather, you want a “one-off” type of service; it is activated only when a request arrives and then “dies.” If a new request needs fulfillment, a new instance of the service is launched, and so on.
To help you better understand when the FaaS model is suitable, consider the following use cases:
ETL (Extract, Transform, and Load): assuming that part of your application is responsible for consuming data from a message queuing system, a social media platform, or even from a traditional FTP site, once the data is obtained, it’s processed and finally saved to a database. The problem is that this data arrives at random intervals (e.g., let’s say your API is retrieving any tweets containing the #globalwarming hashtag). If you deploy a compute instance that runs 24/7 waiting for new tweets, you may incur unnecessary costs. Your infrastructure bill may increase if data processing needs considerable CPU and memory. A much better approach would be to use FaaS. With this mode, as soon as new data arrives from the API, a new function is launched (ideally through a container) to do the required processing. When execution is complete, the container is terminated releasing any resources it was utilizing. Using FaaS in ETL systems can be depicted in the following diagram:
Two-factor authentication: you have an awesome web application that receives tons of visitors every day and because of its increased popularity, you decide to increase the security measures to avoid any hacking attempts. Thus, you implement a two-factor authentication system in which users enter their passwords and also have to enter a one-time password (OTP) code that gets sent to their registered cell phones. The problem is that having the web application binary send the SMS requires a blocking execution process that not only causes latency in displaying the page for the end-user but also increases the load on the web application server, especially during busy/peak times. A possible solution for this is to delegate sending the SMS to a function. Once triggered, the function gets executed in another container to fulfill the request (i.e., sending the SMS). Having verified that the username and password were entered by the user, the web application immediately displays a second page that requires the user to enter the OTP code sent to his/her cell phone. The following graph helps explain the advantage of using FaaS in such a scenario:
Serverless programming: some applications of FaaS include creating whole web applications without having to manage any servers. For example, a web application that is made up of some static content (HTML, JavaScript, images, CSS, and so on) in addition to dynamic content that gets generated from an application server (PHP, Go, Ruby, Python, etc.), a backend database can be architected using FaaS and serverless programming as follows:
Now that you have an understanding of FaaS, and why and when it should be used, it’s time to do a quick hands-on exercise to demonstrate how we can use this model in Kubernetes. Among the well-known tools that make this possible is kubeless. Kubeless can be thought of as an add-on to Kubernetes. It creates a Custom Resource (and a Controller that handles it) and it also offers a handy command-line tool that allows you to easily issue commands. Through the rest of this article, we are going to install kubeless in our cluster and use it to implement a very simple, minimalistic FaaS model for a web application.
Installation is pretty simple. It only involves issuing three commands against your running Kubernetes cluster:
$ export RELEASE=$(curl -s https://api.github.com/repos/kubeless/kubeless/releases/latest | grep tag_name | cut -d '"' -f 4)
$ kubectl create ns kubeless
$ kubectl create -f https://github.com/kubeless/kubeless/releases/download/$RELEASE/kubeless-$RELEASE.yaml
The above commands create a namespace and deploy the necessary components for kubeless to work. If you’re planning to use kubeless extensively, we strongly recommend that you also install the command-line tool that they provide:
export OS=$(uname -s| tr '[:upper:]' '[:lower:]')
curl -OL https://github.com/kubeless/kubeless/releases/download/$RELEASE/kubeless_$OS-amd64.zip && \
unzip kubeless_$OS-amd64.zip && \
sudo mv bundles/kubeless_$OS-amd64/kubeless /usr/local/bin/
The above commands will install the kubeless CLI tool on your linux/macOS system. If you’re running on Windows, you may want to refer to the documentation for specific steps.
We’re going to write a very simple web application that accepts a URL from the user and then scrapes the website for important data. Eventually, the data is sanitized and saved to a backend database. Using FaaS, we can launch a function to scrape the data asynchronously so that it does not block the UI for the user. The model can be extended further to call another function that notifies the user that their data has been scraped successfully and is ready for download (perhaps in the form of CSV or PDF).
Our workflow can be detailed as follows:
We can draw a simple diagram that describes the application’s workflow as follows:
Now that you know how the system looks, let’s move on to actually building it.
The website we’re using is https://webscraper.io/test-sites/e-commerce/allinone. It’s a website created specifically for testing web scraping tools. Now, let’s go through the lab steps.
You can use a number of runtime environments for the function, which means that you can code the function in your favorite programming language. In this lab, we’re using Python. Our Kubeless function lives in the scraper.py file (the name doesn't matter). The contents of the file are as follows:
from bs4 import BeautifulSoup
import requests
import json
def main(event, context):
url = event['data']['url']
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
items = [x.find("p",{"class":"description"}).text for x in soup.find_all("div",{"class":["col-sm-4", "col-lg-4", "col-md-4"]})]
return json.dumps(items)
The script simply scrapes the page and returns the titles of the items which are displayed for sale. The first thing we need to notice here is the structure of the file:
If you’ve used Python before, you may notice that we are using a couple of non-standard libraries, namely: requests and bs4 (BeautifulSoup). This brings up an important question:
More often than not, you’ll need to install third-party libraries to carry out complex tasks. In our lab, we’re using bs4 for web scraping and requests to easily fetch the web page data. To allow the Kubeless function to use external libraries you must do two things:
Now, you can simply deploy the zip file as the function file. However, etcd - by design - cannot store objects that are more than 1.5 MB in size. This means that we cannot store our function in the Kubernetes database if the zip file is large (which is often the case). The solution for this issue is to store the file on a file server and provide a URL to the file location. In our lab, we used GitHub for this purpose but you can use any HTTP server as long as it’s reachable from the pod. Now, let’s see how we can deploy our function.
As is the case with other Kubernetes objects, you can deploy Kubeless functions either imperatively using the kubeless tool, or declaratively using a YAML file. In this lab, we are using the declarative way since we can put the YAML file under version control. Our file, function.yml looks as follows:
apiVersion: kubeless.io/v1beta1
kind: Function
metadata:
name: scraper
namespace: default
label:
created-by: kubeless
function: scraper
spec:
runtime: python3.7
timeout: "180"
handler: scraper.main
deps: ""
checksum: sha256:8c1136a7ecf95aef19c7565b9acb3977645fe98d1f877dd2397aa6455673805e
function-content-type: url+zip
function: https://github.com/MagalixCorp/k8s-faas/blob/master/packagae.zip?raw=true
Let’s have a look at the important parts of this definition:
Now, we can apply this definition the same way we do with any other Kubernetes object:
kubectl apply -f function.yml
Once deployed, you will notice that we have a new pod created for us as well as a service, both with the name scraper:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
scraper-dfc478fd6-n8w6c 1/1 Running 0 3h37m
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.12.0.1 443/TCP 11h
scraper ClusterIP 10.12.5.46 8080/TCP 9h
Notice that the Service Type is ClusterIP, which means that it can only be called from another pod inside the cluster. Kubeless, however, supports adding an Ingress to the Service so that it’s accessible from the outside world. Please refer to the Kubeless documentation for more information on how to do this.
Before going any further, we need to ensure that our function is working. Since it can’t be called except inside the cluster, we have two options:
In this lab, we’ll use the second option since it’s much simpler:
$ kubeless function call scraper --data '{"url":"https://webscraper.io/test-sites/e-commerce/allinone"}'
["Asus VivoBook 15 X540NA-GQ008T Chocolate Black, 15.6\" HD, Pentium N4200, 4GB, 500GB, Windows 10 Home, En kbd", "Acer Swift 1 SF113-31 Silver, 13.3\" FHD, Pentium N4200, 4GB, 128GB SSD, Windows 10 Home", "Asus VivoBook X441NA-GA190 Chocolate Black, 14\", Celeron N3450, 4GB, 128GB SSD, Endless OS, ENG kbd"]
The data is passed to the function in JSON format. You can access the data sent to the function by examining the event['data'] object. So, now that we’re confident that our function responds to HTTP POST requests and returns the expected response, let’s add the wrapper web application.
The web application we’re using is a slightly modified version of the one we used in our sidecar pattern article. It basically has the following components:
The complete working project can be found at our GitHub repo https://github.com/MagalixCorp/k8s-faas. Now, let’s go through the application files:
The file contains a NodePort service type and a Deployment. The Deployment creates a pod that hosts two containers: the app and the web. Notice the sidecar pattern used here where Nginx relays backend requests to the API server by calling localhost as if both services are running on the same host.
The final step here is to test our work. Navigate to any of your cluster’s node IP addresses on port 32001. Enter https://webscraper.io/test-sites/e-commerce/allinone in the URL box and click submit.
Because our function was able to execute the task so fast, we can see the results returned to us by exploring the network tab of our browser’s developer tools. In a real-world scenario, the user should just receive a nice “Thank you” message while the function starts its execution journey asynchronously. When execution is done, there should be some sort of alerting logic to notify clients that their data is ready. The function can store the results in permanent storage where the user can view it via another page.
Self-service developer platform is all about creating a frictionless development process, boosting developer velocity, and increasing developer autonomy. Learn more about self-service platforms and why it’s important.
Explore how you can get started with GitOps using Weave GitOps products: Weave GitOps Core and Weave GitOps Enterprise. Read more.
More and more businesses are adopting GitOps. Learn about the 5 reasons why GitOps is important for businesses.
Implement the proper governance and operational excellence in your Kubernetes clusters.
Comments and Responses