You just created an amazing Machine Learning model. You’ve found the data within your data warehouse, trained and tuned the model, and finally deployed that model to AWS Sagemaker. The project seems like it is almost done, , but you quickly realize that the model results aren’t sitting where the rest of your data is: in your Data Warehouse, an AWS Redshift database.

One option is to try out Sagemaker’s batch transform. With batch transform, you can send bulk inputs from S3 (but not Redshift) into a Sagemaker model, and then store those predictions back into S3. To use batch transform for this use-case, you’ll need to pull data from Redshift to S3 and vice-versa. All of these tasks need to be managed in a new script that has multiple, complicated moving parts. This can be a frustrating process to create all of these painful extra steps to simply put Sagemaker results back into the location where you store your data.

Booklet.ai makes it possible to send model predictions from Sagemaker endpoints to your Redshift database in minutes. In this post, we will talk through the steps to get this setup in Booklet.ai.

database dataflow

How Booklet.ai works

Booklet make it easy to integrate AWS Sagemaker and AWS Redshift directly with only SQL inputs and no changes to your model. Here’s an overview of how Booklet works:

  1. Grant Booklet.ai read-only access to a limited number of AWS Sagemaker actions.
  2. Choose the Sagemaker endpoint from which you’d like to send the predictions to Redshift.
  3. Setup the source and destination configurations.
  4. Kickoff a dataflow and watch as results flow into Redshift.

Read below for full details.

Signup for Booklet.ai

Booklet is free to sign up for, no credit card required. Sign up below:

Sign Up for Booklet.ai  ▶

Create an AWS Sagemaker Endpoint

sagemaker

This tutorial assumes you’ve already deployed an ML model to AWS Sagemaker and created an endpoint for the model. See the AWS docs on hosting services for information on this process.

Grant Booklet.ai access to AWS Sagemaker

You need to grant us read-only access to a limited number of Sagemaker actions via an IAM role that is associated with our AWS account.

Follow these steps to create a read-only IAM Role for Booklet:

  1. Create a new role in the AWS IAM Console.
  2. Select “Another AWS account” for the Role Type.
  3. Enter “256039543343” in the Account ID, field (this is the Booklet.ai AWS account id).
  4. Click the “Next: Permissions” button.
  5. Click the “Create Policy” button (opens a new window).
  6. Select the JSON tab. Copy and paste this JSON into the text area.
  7. Click “Review policy”.
  8. Name the policy “BookletAWSIntegrationPolicy”.
  9. Click “Create policy” and close the window.
  10. Back in the “Create role” window, refresh the list of policies and select the policy you just created.
  11. Click “Next: Review”.
  12. Give the role a name such as “BookletAWSIntegrationRole”. Click “Create Role”.
  13. Copy the Role ARN. It looks like something like “arn:aws:iam::123456789012:role/BookletIntegrationRole”.

With the AWS Role created and the ARN on your clipboard, we’re almost there. In the Booklet.ai settings, paste in the AWS Role ARN and click the “Save” button:

comments

For more information, check out the Booklet.ai docs

Booklet and AWS are now integrated!

Select Endpoint within Booklet.ai

Click the “New Model” button within Booklet.ai, choose the Sagemaker endpoint you’d like to connect to Redshift, and click “Create”:

endpoint name ui

You also now have a nice UI to test your model - check our post for more information on how to turn your model into a nice demo.

Setup the Source

Now you need to set up the Redshift source. This will tell Booklet.ai what data to query from the Redshift database and send to the model for scoring. You can input the query here:

how to source

The first column in the query result should be an identifier (for example: a record id, email address, or customer id). The remaining columns are the feature values (or model inputs) in the same order the model expects as input. Any SQL query can be inputted here, regardless of complexity!

Setup the Destination

Next, you should setup how you want to send the results in Redshift! You can setup a simple INSERT INTO statement and input the attributes that you want to reference from the Booklet.ai output:

how to destination

You can reference the sample attributes in the table below the query field after you have run your source at least once! With this query, the results will be appended into the leads.leads_predictions table.

Run the Dataflow

Now that you have both the Source and the Destinations, head to the dataflow section and select the source and destination that you set up above. You can only select one source at a time, but you can send results to as many destinations as you’d like:

how to dataflow run

‎⁨Once the dataflow kicks off, you’ll see the logs that how the source data getting pulled, those inputs getting scored against the Sagemaker model, and finally the predictions getting send back to Redshift:

how to dataflow logs

Finally, you can head over to redshift and see the results populated in your leads.leads_predictions:

how to dataflow predictions_query

See how easy that can be? Just set up a couple SQL statements for the input and output, run the dataflow and your results are populated in Redshift! You no longer have to set up complicated, custom scripts and pipe data around to multiple places. Signup to be an early user of Booklet.ai!

Sign Up for Booklet.ai  ▶

You may also enjoy...