The vision for our system architecture was clear from the beginning: we wanted to build a website containing an application which would allow logged-in users to upload an image file and receive a fast prediction from a deep-learning model as to which objects (distinct, predefined parasite eggs), if any, were present in the image.
The vision was clear, but making it a reality was not so straightforward. In the end, however, we pulled it off and built a robust system architecture which we will now easily be able to build upon moving forward as well. The diagram below shows the sketch of the system architecture. Below, I will detail all major parts of the process. Hopefully this post will help others who are trying to build similar functionalities into their own projects.
You’ll notice that the diagram is split into two parts. On the left is the model serving system architecture and on the right is the data ingestion, model building and deployment. This split shows the two independent set-ups in place, and how they are connected. This post will primarily cover the left half of the diagram: the model serving system architecture; diving into the data ingestion, model building and deployment is out of scope for this blog post. The important thing to be aware of is that the image data is cleaned and stored independently from the model serving system architecture, and that the same is true for the building and deployment of our deep-learning CNN model. Our model has been deployed as a web API using Flask on a different server than our website is hosted on. Now, let’s dive into all the pieces of the model serving system architecture!
4. We use the AWS Lambda compute service to run back-end code in a serverless environment. Our system architecture actually uses two separate Lambda functions. The first is a script written in Node.js which receives the image (as a binary string) from the front end—transmitted through the aforementioned API. This Lambda function plays an important role in this system, acting as a control center of sorts, sending and receiving data from multiple sources, writing and even deleting data in some cases, and making sure that data (or any error messages) are returned in the correct way to the front end.
5. The first action taken by the Lambda function mentioned above is to decode the image from its binary string and send it to an S3 bucket which holds all user-uploaded images. The filename of the image is a timestamp appended by a random string of bytes. Our goal is to add these images submitted by users (who have given us their permission) to our data set in order to improve our model continually.
6. Next, the Node.js Lambda invokes a second Lambda: this one having a Python runtime language. The filename of the image in S3 is passed from the first Lambda to the second Lambda. This second Lambda is really just a “middleman” in this current setup; it is in place not for technical reasons but rather for practical reasons, as it has allowed us to build and test the two parts of our architecture separately. We may well eliminate this component later on.
7. Our CNN model for classifying images was deployed as a web API using Python and Flask (see earlier post Serving a Keras Model with Flask) on a SoftLayer virtual server. Our second Lambda calls this API and sends the filename of the image in S3 via a GET request. Then, the image is downloaded into a temporary directory on the remote server using
boto (the AWS SDK for Python) and previously created IAM access credentials. The image is then fed into our model (held in memory in that server), and an array of predictions (one for each class) is returned. The Python Lambda then returns this array (as a string) to the Node.js Lambda.
8. Now we’re back in our “control center” Lambda, and we are ready to take action on the data received from the Flask API. First, our code checks to make sure that no error was returned (an error almost always indicates that the server was down). If there was an error, the error is logged in CloudWatch. Then, we check the value of a variable which was passed from the front end which we haven’t mentioned yet: a boolean variable we call
agree which indicates whether or not the user agreed to let us keep their image and data about predictions made on that image. If
true, then we save the save the filename of the image in S3 as the primary key in an Amazon DynamoDB table, along with the class predictions from the model (if there were no errors). If, on the other hand,
false, rather than storing anything in DynamoDB, an extra command deletes the image altogether from S3. Finally, the predictions (or error messages, if any) are returned to the front end as a callback which is returned via API Gateway.
And that is how a web app—which allows users to receive a prediction from a deep-learning model based on an image—is built!