You just created an amazing Machine Learning model. You’ve found the data within your data warehouse, trained and tuned the model, and finally deployed that model to AWS Sagemaker. The project seems like it is almost done, , but you quickly realize that the model results aren’t sitting where the rest of your data is: in your Data Warehouse, an AWS Redshift database.
One option is to try out Sagemaker’s batch transform. With batch transform, you can send bulk inputs from S3 (but not Redshift) into a Sagemaker model, and then store those predictions back into S3. To use batch transform for this use-case, you’ll need to pull data from Redshift to S3 and vice-versa. All of these tasks need to be managed in a new script that has multiple, complicated moving parts. This can be a frustrating process to create all of these painful extra steps to simply put Sagemaker results back into the location where you store your data.
Booklet.ai makes it possible to send model predictions from Sagemaker endpoints to your Redshift database in minutes. In this post, we will talk through the steps to get this setup in Booklet.ai.
How Booklet.ai works
Booklet make it easy to integrate AWS Sagemaker and AWS Redshift directly with only SQL inputs and no changes to your model. Here’s an overview of how Booklet works:
- Grant Booklet.ai read-only access to a limited number of AWS Sagemaker actions.
- Choose the Sagemaker endpoint from which you’d like to send the predictions to Redshift.
- Setup the source and destination configurations.
- Kickoff a dataflow and watch as results flow into Redshift.
Read below for full details.
Signup for Booklet.ai
Booklet is free to sign up for, no credit card required. Sign up below:
Create an AWS Sagemaker Endpoint
This tutorial assumes you’ve already deployed an ML model to AWS Sagemaker and created an endpoint for the model. See the AWS docs on hosting services for information on this process.
Grant Booklet.ai access to AWS Sagemaker
You need to grant us read-only access to a limited number of Sagemaker actions via an IAM role that is associated with our AWS account.
Follow these steps to create a read-only IAM Role for Booklet:
- Create a new role in the AWS IAM Console.
- Select “Another AWS account” for the Role Type.
- Enter “256039543343” in the Account ID, field (this is the Booklet.ai AWS account id).
- Click the “Next: Permissions” button.
- Click the “Create Policy” button (opens a new window).
- Select the JSON tab. Copy and paste this JSON into the text area.
- Click “Review policy”.
- Name the policy “BookletAWSIntegrationPolicy”.
- Click “Create policy” and close the window.
- Back in the “Create role” window, refresh the list of policies and select the policy you just created.
- Click “Next: Review”.
- Give the role a name such as “BookletAWSIntegrationRole”. Click “Create Role”.
- Copy the Role ARN. It looks like something like “arn:aws:iam::123456789012:role/BookletIntegrationRole”.
With the AWS Role created and the ARN on your clipboard, we’re almost there. In the Booklet.ai settings, paste in the AWS Role ARN and click the “Save” button:
For more information, check out the Booklet.ai docs
Booklet and AWS are now integrated!
Select Endpoint within Booklet.ai
Click the “New Model” button within Booklet.ai, choose the Sagemaker endpoint you’d like to connect to Redshift, and click “Create”:
You also now have a nice UI to test your model - check our post for more information on how to turn your model into a nice demo.
Setup the Source
Now you need to set up the Redshift source. This will tell Booklet.ai what data to query from the Redshift database and send to the model for scoring. You can input the query here:
The first column in the query result should be an identifier (for example: a record id, email address, or customer id). The remaining columns are the feature values (or model inputs) in the same order the model expects as input. Any SQL query can be inputted here, regardless of complexity!
Setup the Destination
Next, you should setup how you want to send the results in Redshift! You can setup a simple
INSERT INTO statement and input the attributes that you want to reference from the Booklet.ai output:
You can reference the sample attributes in the table below the query field after you have run your source at least once! With this query, the results will be appended into the
Run the Dataflow
Now that you have both the Source and the Destinations, head to the dataflow section and select the source and destination that you set up above. You can only select one source at a time, but you can send results to as many destinations as you’d like:
Once the dataflow kicks off, you’ll see the logs that how the source data getting pulled, those inputs getting scored against the Sagemaker model, and finally the predictions getting send back to Redshift:
Finally, you can head over to redshift and see the results populated in your
See how easy that can be? Just set up a couple SQL statements for the input and output, run the dataflow and your results are populated in Redshift! You no longer have to set up complicated, custom scripts and pipe data around to multiple places. Signup to be an early user of Booklet.ai!
You may also enjoy...
- Machine learning deserves its own flavor of Continuous Delivery
- How to setup a local AWS SageMaker environment for PyTorch
- Turn a ML Model into a fully integrated web app with Booklet.ai
- How to create a REST API for a AWS Sagemaker Endpoint (quickly)
- Create a web app for your AWS Sagemaker ML model (quickly)