Infrastructure-based Method

Overview

Paigo can measure SaaS customer's usage by inspecting the underlying shared resource usage and map data to each individual tenants to derive accurate usage of customers' in real-time. Among other usage measurement and collection methods, infrastructure-based method requires least setup and maintenance. For comparison with other usage measurement and collection methods, see Meter Usage Data at Scale chapter for full documentation.

How it Works

Within Paigo's Usage Measurement and Collection engine, there is a component called Infrastructure Connector. At a regular interval, infrastructure connector will use Cloud API to gather necessary information for calculate precise resource usage at tenant level. Then it uses metadata and algorithms to slice and dice resource usage into per tenant usage and store data in the usage journal.

As an example, consider a common design pattern in SaaS industry that saves customer data in a shared blob storage system such as S3 buckets with different file paths, so that customer A's archive are saved under s3://all-customer-archive/customer-a-2022-01-01.json and customer B's archive are saved under `s3://all-customer-archive/customer-b-2022-01-01.json Assume both customers are high usage customers so that there will be gigabytes of data moved into and out of their S3 folder every hour. Unfortunately, AWS S3 AWS has neither the data size available to query at prefix level, nor the total bucket size at hourly frequency. Paigo automatically measures the data usage of customer A and customer B with Infrastructure-based Measurement. Below are the high-level steps of how Paigo Usage Measurement and Collection measures the usage in this example:

  1. At a frequent interval, calls AWS API to get all the objects in S3

  2. Walks the virtual directory (prefix) recursively to construct the hierarchy structure and filter the target object patterns / folders / prefix

  3. Group all of the objects by metadata key-value pairs

  4. Calculate the total size a particular customer has used for the short period of time (such as 5 minutes).

  5. This usage record value is timestamped and indexed in the backend journal for future aggregation.

  6. As the data size consumed by customer A and B go up and down, Paigo samples the usage size frequently and keeps the usage journal in the backend.

  7. On top of the hour, aggregation process kicks in to calculate the total usage of customer A and B respectively in the past hour based on the raw data collected every 5 minutes.

Use Case

For infrastructure-focused SaaS business, the product metrics can typically closely related to the usage of the resources. Those resources usually fall into the categories of compute, storage and network. This measurement method is the best way to automatically calculate the usage amount in realtime and bill customers on. Below is an example list possible product metrics that are good fit for Infrastructure-based Measurement. Note that it is not an exhaustive list.

  • Compute Time / GPU Milliseconds / Query Time / Execution Time / Job Running Time This metric is commonly seen in data platform, AI/ML platform and CI/CD platform. The underlying resources are typically some computing system such as containers, VMs or Kubernetes.

  • Data Storage / Archive / Snapshot / Backup / Log file The data at rest and data in transit product metrics are more commonly seen within big data industries.

  • Network Egress / Peering / PrivateLink / Load Balancer Network The network usage is usually common for infrastructure SaaS where the infrastructure cost represents the majority of the business COGS.

Set up Measurement

Navigate to Measurement tab and click New Measurement button to see the Measurement Template Table. Choose from the table one of the agent-based measurement templates to open the measurement creation form. In the Creation Form, some fields are pre-filled based on the template. Provide values to other required or optional fields. Instructions for some of the fields in the form:

  • Measurement Frequency: This field dictates how frequent will Paigo calculates the raw usage data per tenant. The only supported mode is Automatic. Under this mode, Paigo decides the best frequency to sample usage based on many factors, such as type of infrastructure, platform, region, success rate, API throttling, etc.

  • IAM Role ARN and External ID: The read-only access role for Paigo to interact with Cloud API. See Configure IAM Role page for details on setting up the role.

The next step is to link measurement to dimensions. Dimension represents the abstraction concept of a product metric, whereas measurement represents the implementation of usage measurement and collection. When a measurement ID is attached to the Measurement ID field of a Dimension, the measured usage data will be treated as the data points for that particular Dimension.

As an example, a dimension is defined as Job Running time in Minutes, and a measurement is defined as the Elapsed Time of Serverless Container. Attaching the Measurement ID will instruct Paigo to treat every single usage value measured from severless container as the running time of a job in minutes.

Note that the actual trigger of measurement is the attachment to a dimension. So when measurement is created initially, Paigo doesn't schedule any process to start measurement. Once a measurement is attached to a dimension, processes are scheduled to start measurement frequently.

Required Tagging Schema

Paigo uses metadata to qualify measurement data and differentiate tenants. For the underlying infrastructure resources to be identified as being used by a particular tenant, Paigo requires these tagging schema.

  • paigoDimensionId: REQUIRED. Comma delimited unique identifiers of the dimension this usage record is associated with, assigned by Paigo during dimension creation. Example: e6a4a1ab-7fd6-43cf-b44f-73a2539fdf85,7946113e-b1b4-11ed-afa1-0242ac120002

  • paigoCustomerId: REQUIRED. The Customer ID of a customer this usage record attributes to, assigned by Paigo during customer creation. Example: e8366954-6f36-47e9-8431-ac95f88b5cc7.

The above tagging schema can be used on any infrastructure resources measured by Paigo.

EC2 Instance Compute Time (Measurement Template)

Measurement PropertyValue

Measurement Frequency

5 Minutes

Precision

Minute

Unit

Hour

Aggregation Method

Sum

Aggregation Interval

Hour

Usage Increment

1

Rounding

Ceiling

Compute time of EC2 instance is the length of collective period of time that AWS EC2 instances runs in good state. Paigo measures the running time of AWS EC2 Instance. With predefined frequency, Paigo will collect the running status of qualified instances, and attribute usage to the right customer automatically. For an instance to be qualified for usage calculation, the following conditions must be met:

  • Instances must be in the right region, as specified in measurement configuration.

  • Instances must be viewable by the role Paigo assumes, as specified in measurement configuration.

  • Instances must be in RUNNING state. All hibernated/stopped/terminated instances are not measured as usaged.

  • Instances must be tagged with correct tagging schema, as specified in Required Tagging Schema

  • Instances of type on-demand, reserved or spot instances can all be qualified for usage calculation.

For multiple qualified instances, Paigo will calculate the usage of a sample period as the sum of all running time. For example, if Paigo samples usage every 5 minute and there are 3 qualified instances, the total usage measured by Paigo will be 15 minutes.

The metadata collected on each usage record will be all the property on instances, such as instance ID, vpc, tags, network interfaces, etc.

EC2 Egress (Measurement Template)

Measurement PropertyValue

Measurement Frequency

5 Minutes

Precision

Byte

Unit

Byte

Aggregation Method

Sum

Aggregation Interval

Hour

Usage Increment

1

Rounding

Ceiling

EC2 Egress is the outbound network generated from AWS EC2 instances to internet or other part of the network on AWS. With predefined frequency, Paigo measures the egress traffic originated from AWS EC2 Instances, and attribute usage to the right customer automatically. For an instance to be qualified for egress usage calculation, the following conditions must be met:

  • Instances must be in the right region, as specified in measurement configuration.

  • Instances must be viewable by the role Paigo assumes, as specified in measurement configuration.

  • Instances' CloudWatch metrics must be viewable by the role Paigo assumes, as specified in measurement configuration.

  • Instances must be tagged with correct tagging schema, as specified in Required Tagging Schema

For egress traffic from multiple qualified instances, Paigo will calculate the usage of a sample period as the sum of all egress traffic from all instances. For example, if Paigo samples usage every 5 minute and there are 3 qualified instances with egress usage of 1,000 bytes, 2,000 bytes and 3,000 bytes, the total usage measured by Paigo will be 6,000 bytes. Also note that the egress data measured has a 10 minute lag. For example, the egress traffic occurred during 00:00 AM - 00:05 AM will be measured by Paigo at around 00:10 AM.

The frequency of egress measurement can increase up to once per minute. However, there are additional cost associated with increased frequency charged by AWS CloudWatch.

The metadata collected on each usage record will be all the property on instances where egress traffic originates from, such as instance ID, vpc, tags, network interfaces, etc.

EBS Volume and Snapshot (Measurement Template)

Measurement PropertyValue

Measurement Frequency

5 Minutes

Precision

Gigabyte

Unit

Gigabyte

Aggregation Method

Max

Aggregation Interval

Hour

Usage Increment

1

Rounding

Ceiling

EBS volume size and snapshot are two different dimension Paigo can automatically measure with measurement template. The technology behind is the same. With predefined frequency, Paigo will collect the information of qualified EBS volumes or snapshots, and attribute usage to the right customer automatically. For a volume or a snapshot to be qualified for usage calculation, the following conditions must be met:

  • The volume or the snapshot must be in the right region, as specified in measurement configuration.

  • The volume or the snapshot must be viewable by the role Paigo assumes, as specified in measurement configuration.

  • The volume or the snapshot must be tagged with correct tagging schema, as specified in Required Tagging Schema

For multiple qualified volumes or snapshots, Paigo will calculate the usage of a sample period as the sum of all volumes or the sum of all snapshots. For example, if in a particular sample period, there are three qualified volumes of size 10 GB, 30 GB and 40 GB, the total usage measured by Paigo will be 80 GB.

The metadata collected on each usage record will be all the property on the volumes such as IOPS, provisioned throughput, volume ID, etc..

Virtual Machine Compute Time on Azure (Measurement Template)

Measurement PropertyValue

Measurement Frequency

5 Minutes

Precision

Minute

Unit

Hour

Aggregation Method

Sum

Aggregation Interval

Hour

Usage Increment

1

Rounding

Ceiling

Compute time of virtual machines on Azure is the length of collective period of time that virtual machines run in good state. Paigo measures the running time of Azure virtual machine. With predefined frequency, Paigo will collect the running status of qualified virtual machines, and attribute usage to the right customer automatically. For a virtual machine to be qualified for usage calculation, the following conditions must be met:

  • Virtual machines must be in the right region, as specified in measurement configuration.

  • Virtual machines must be viewable by the role Paigo assumes, as specified in measurement configuration.

  • Virtual machines must be in RUNNING state. All deallocated/stopped virtual machines are not measured as usaged.

  • Virtual machines must be tagged with correct tagging schema, as specified in Required Tagging Schema

For multiple qualified virtual machines, Paigo will calculate the usage of a sample period as the sum of all running time. For example, if Paigo samples usage every 1 minute and there are three qualified Virtual Machines, the total usage measured by Paigo will be 3 minutes.

The metadata collected on each usage record will be all the property on the virtual machines, such as virtual machine name, ID, location, etc.

Managed Disk on Azure (Measurement Template)

Measurement PropertyValue

Measurement Frequency

5 Minutes

Precision

Gigabyte

Unit

Gigabyte

Aggregation Method

Max

Aggregation Interval

Hour

Usage Increment

1

Rounding

Ceiling

Paigo measures the size of Azure managed disk. With predefined frequency, Paigo will collect the running status of qualified managed disks, and attribute usage to the right customer automatically. For a managed disk to be qualified for usage calculation, the following conditions must be met:

  • Managed disks must be in the right region, as specified in measurement configuration.

  • Managed disks must be viewable by the role Paigo assumes, as specified in measurement configuration.

  • Managed disks must be tagged with correct tagging schema, as specified in Required Tagging Schema

For multiple qualified managed disks, Paigo will calculate the usage of a sample period as the sum of all disks combined. For example, if in a particular sample period, there are three qualified managed disks of size 10 GB, 30 GB and 40 GB, the total usage measured by Paigo will be 80 GB.

The metadata collected on each usage record will be all the property on the managed disks, such as disk name, ID, location, configurations, etc.

Last updated