Loading

GTS Handling and Storage

In this tutorial, you will learn how to retrieve, create, modify, store GTS. You will understand the link between token system and GTS storage. It is a key milestone in your learning path.

For this tutorial, you need your own running instance of Warp 10: getting started. You also need to read the basic concepts, the token mechanism page.

GTS internals

gts

NEWGTS 'gammasensor' RENAME { 'unit' 'm/s2' 'localisation' 'front left' } RELABEL { 'modelisationID' '3211' } SETATTRIBUTES NOW 45.823411 -1.2342111 450000 -0.042 ADDVALUE NOW 1 s + 45.823411 -1.2342111 450000 -0.042 ADDVALUE

The JSON representation of a GTS object is close to the image upper. The GTS contains:

  • "c": the classname, a string
  • "l": the labels, a MAP of string/string key/value
  • "a": the attributes, a MAP of string/string key/value
  • "v": a table of value.

The table of values can have two to five columns:

  • [ 0 3.14 ]: timestamp, value
  • [ 0 12 3.14 ]: timestamp, elevation, value
  • [ 0 -45.0 0.2 3.14 ]:timestamp, latitude, longitude, value
  • [ 0 48.44218 -4.41427 80000 3.14 ]: timestamp, latitude, longitude, elevation, value

Timestamp is a signed LONG. Default platform configuration is microsecond. Using this configuration, you cover from year -290308 to year 294247. Up to your specific domain, you may need nanosecond precision, and restrict the range from year 1677 to year 2262. If you plan to use archaeological data, you can also change platform settings to millisecond timestamps.

Latitude and longitudes are double precision inputs, but they are stored internally as a highly optimized HHCode. This format is tailor made for latitude and longitude. Do not try to store other values here.

Elevation is a long, in millimeters. You can store any other long value inside this field.

Value could be one of the four allowed types: BOOLEAN, LONG, DOUBLE, STRING.

In a GTS, you can put any of the four allowed types and even mix them, but this is not advised. The type of the GTS is determined at read time. The type of the first read value will set the GTS type.

When you use text input format, if you plan to use DOUBLE, beware to add .0 to clearly indicate Warp 10 you want to store a DOUBLE. This error is common when you use the API update endpoint. Here is a WarpScript which takes the same input format and parse it to build a GTS:

//example of error in a DOUBLE gts input // there is no elevation // between <' '> this is a multiline string <' 23/48.0:-4.5/ bar{label0=val0} 0 23/48.0:-4.5/ bar{label0=val0} 3.14 23/48.0:-4.5/ bar{label0=val0} 4.1 '> PARSE

You see that the values are rounded to the nearest integer.

RAM versus Storage

The storage part of Warp 10 is designed to store GTS. When you store a GTS, you cannot have different values for the same timestamp. When you manipulate a GTS in RAM memory (on the stack), you can have multiple timestamps. When you store such a GTS with UPDATE function, the storage function remove duplicates timestamp, keeping the value of the last one.

Examples below cannot be executed, you need to practice with your own instance and your own tokens.

'DataLakeW' 'Wt' STORE //store write token 'DataLakeR' 'Rt' STORE //store read token NEWGTS 'testGTS' RENAME 0 NaN NaN NaN 3.14 ADDVALUE //timestamp 0 1 45.0 2.0 NaN 4.0 ADDVALUE //timestamp 1 0 45.0 2.0 NaN 4.0 ADDVALUE //timestamp 0 0 NaN NaN 22 5.0 ADDVALUE //timestamp 0 DUP //let a copy on the stack for visualization $Wt UPDATE { 'token' $Rt 'class' 'tetsGTS' 'labels' {} 'end' NOW // timestamp or ISO8601 string 'count' 10 // limit to last 10 points per gts } FETCH

Hidden property RENAMED

As explained in another tutorial, erasing your own data with UPDATE is easy. For example, you FETCH some data, apply a mean mapper, then UPDATE the result: oops, you have just erased your original data. Warp 10 has a security mechanism to prevent this: if you do not rename a GTS, you cannot UPDATE it. A GTS object has a 'I was renamed' security flag. You cannot see it in the stack JSON representation.

Hidden property BUCKETIZED

When you fetch a GTS, or create it in your WarpScript, this is a raw GTS. You may have a regular period, maybe with missing points, you may or may not have timestamp aligned on a clock (a 10.005ms period for example).

If you want to MAP or REDUCE, you need aligned series, sharing common ticks. To realign GTS, you will use BUCKETIZE. A bucketized Geo Time Series has at most one measurement per bucket, there might be buckets with no measurements. As soon as the GTS is bucketized, you can use specific functions to fill empty buckets, detect outliers, detect patterns. A GTS object has a 'I am bucketized with this bucket span and this bucket end' flag. You cannot see it in the stack JSON representation.

Hidden attributes

A stored GTS has three attributes linked to the storage security model:

  • .app: the application name. You can see in the stack JSON representation. Even if you try to override it manually, it will be enforced by the app name contained in your write token.
  • .owner: the owner UUID. You cannot see it, it is enforced by your write token.
  • .producer: the producer UUID. You cannot see it, it is enforced by your write token. When you create a GTS on the stack with WarpScript, these labels are not defined. As you will see below, write tokens may add other labels, which will be visible in the stack JSON representation. Following chapter explain in details the token interaction in GTS storage.

GTS Storage

Write process

When you push a GTS to the update endpoint, or with the UPDATE function, what does the Warp 10 storage engine exactly do?

  • Decrypt the token content. If the token is not encrypted with the platform key (warp.aes.token in your configuration) or if it is a read token, it stops there and raise an invalid token error.
  • In the write token, Warp 10 extracts the expiry timestamp. If the token is expired, it stops there with an error.
  • In the write token, Warp 10 extracts .app, .owner, .producer labels, and add/overwrite them in the GTS object.
  • In the write token, Warp 10 extracts the labels MAP, which can contains other labels you may define while you create the token, and add/overwrite them in the GTS object.

Once the GTS object is complete, Warp 10 storage engine computes a hash of the classnames and all the labels name and value. If this hash is unknown in the system, it creates a new GTS. If it exists, it updates the values. Then it updates the internal Sensision metrics.

Tokens content directly contributes to the unicity of GTS in the Warp 10 storage system.

Delete process

When you want to delete a GTS, or an interval of time in a GTS, you also need a write token. What does the Warp 10 storage engine exactly do?

  • Decrypt the token content. If the token is not encrypted with the platform key (warp.aes.token in your configuration) or if it is a read token, it stops there and raise an invalid token error.
  • In the write token, Warp 10 extracts the expiry timestamp. If the token is expired, it stops there with an error.
  • In the write token, Warp 10 extracts .app, .owner, .producer labels. If the owner is different from producer UUID, then it stops here and raise an error.
  • In the write token, Warp 10 extracts the labels MAP, which can contains other labels you may define while you create the token.
  • In the write token, Warp 10 extracts attributes MAP. If it contains the special key .nodelete then the delete operation is canceled even if owner is the same as producer. The input selector is completed with .app, .owner, .producer and labels. Then it performs a dry delete. If the number of GTS found is less than the specified input, it performs the delete. If it is a full delete, the entry is removed from the directory component of Warp 10. Then it updates the internal Sensision metrics.

Meta write process

Meta endpoint is used to update the attributes. Attributes are stored in the directory component of Warp 10. You also need a write token to perform an attribute update. What does the Warp 10 storage engine exactly do?

  • Decrypt the token content. If the token is not encrypted with the platform key (warp.aes.token in your configuration) or if it is a read token, it stops there and raise an invalid token error.
  • In the write token, Warp 10 extracts the expiry timestamp. If the token is expired, it stops there with an error.
  • In the write token, Warp 10 extracts .app, .owner, .producer labels, and add/overwrite them in the GTS object.
  • In the write token, Warp 10 extracts the labels MAP, which can contains other labels you may define while you create the token, and add/overwrite them in the GTS object.
  • The attributes contained in the write token have no link with GTS attributes. Token attributes is just a way to add more information in a token. Once the GTS object is complete, Warp 10 storage engine computes a hash of the classnames and all the labels name and value. If this hash is unknown in the system, it does nothing. If it exists, it updates the attributes values.
'DataLakeW' 'Wt' STORE //store write token 'DataLakeR' 'Rt' STORE //store read token NEWGTS 'testGTS' RENAME { 'attr' 'yes' } SETATTRIBUTES $Wt META NEWGTS 'testGTS_2' RENAME { 'attr' 'yes' } SETATTRIBUTES $Wt META //testGTS_2 was never updated, it does not exists in directory. [ $Rt '~testGTS.*' {} NOW -10 ] FETCH //testGTS has a new attribute. //testGTS_2 was not created by META.

Read process

Fetch endpoint is used to read GTS from the Warp 10 storage engine. What does the Warp 10 storage engine exactly do?

  • Decrypt the token content. If the token is not encrypted with the platform key (warp.aes.token in your configuration) or if it is a write token, it stops there and raise an invalid token error.
  • In the read token, Warp 10 extracts the expiry timestamp. If the token is expired, it stops there with an error.
  • In the read token, Warp 10 extracts .app, .owner labels. These application and owner are not used for access control, but for billing system in the internal Sensision metrics
  • In the read token, Warp 10 extracts the labels, owners, producers, applications lists, if they exists. If these lists does not exist, the read token is considered as a wildcard token. Warp 10 storage engine computes a hash of the classnames and all the labels name and value, and ask directory component for their locations. Then it performs the fetch and returns a list of GTS. Then it updates the internal Sensision metrics for the .owner of the token.

Tokens can restrict finely what you are allowed to read. Read tokens are designed to set up a pay as you fetch system.

GTS storage from WarpScript

UDPATE

The following WarpScript build 50 GTS and store them into your local Warp 10 instance:

'DataLakeW' 'Wt' STORE //store write token 'DataLakeR' 'Rt' STORE //store read token [] // empty list on the stack to push new gts into 1 50 //create 50 sensors data <% 'i' STORE NEWGTS 'Zmagneticflux' RENAME { 'unit' 'T' 'id' 'sensor' $i TOSTRING + } RELABEL //create labels for each gts RAND 'amplitude' STORE RAND 0.2 * 'period' STORE 1 1000 //with 1000 points at 1hz, starting at unix epoch <% 't' STORE $t $period * SIN $amplitude * 'value' STORE $t s NaN NaN NaN $value ADDVALUE %> FOR +! %> FOR //list of 50 gts on the top of the stack $Wt UPDATE //push it into the database

The classname is the same (same sensors), unit is the same, but the sensor ID define unique GTS. As timestamps are fixed from 0 to 1000, running the same script again just overwrite data in the storage.

FETCH

FETCH instruction takes a MAP input.

{
  'token'    token<STRING>
  'class'    classname<STRING>
  'labels'   <MAP>
  'end'      x        // timestamp<LONG> or ISO8601<STRING>
  'timespan' t<LONG>  // timespan or start
  'start'    y        // timestamp<LONG> or ISO8601<STRING>
  'count'    z<LONG>  // optional: limit to 1000 points per gts

  // lots of other options are possible. see FETCH documentation for pagination, gts pagination, and more fetch options
  // here are some very usefull ones:
  'boundary' 1 // optional: also fetch one point before and after required timerange
  'timestep'  5 s // optional: one point every 5 second. see also 'sample' for faster random selection.
  'step'  10 // optional: returns one point out of 10
} FETCH
  • Token must be a valid read token
  • classname could be a regular expression. It allow you to select several classnames, based on a pattern or on alternations. If you want to use a regular expression, start your string with ~. Here is a list of valid classnames selectors:
'sensor1' //only sensor1. no need for a regular expression
'~.*'  //all the possible classnames (dot mean any character) (star mean zero ore more)
'~input.*' //all the classnames starting with input
'~.*temperature.*' // all the classnames which contains temperature
'~sensor[1-9]'   //sensor1, sensor2, ... sensor8, sensor9
'~sensor(3|4|9)' //sensor3, sensor4 and sensor9
'~(temperature28|temperature29|sensor2|sensor4|sensor42)' //logical OR to select only these classnames
  • labels is a MAP of labels. It can contains a mix of GTS attributes and labels, there is no difference in the syntax. It can also contain regular expressions. Here is a list of valid labels selector:
{} //every possible labels
{ 'unit' 'km/h'  'type' 'sedan' } //select only data in km/h of sedan type cars
{ 'vin' '~VF1BK5.*' } //select only vehicle numbers starting with VF1BK5
  • You can choose a combination of parameters to build your request:
    • start and end
    • start and end and count: fetch will stop when count is reached, or when start is reached.
    • end and timespan for a better time range expression
    • end and timespan and count
    • end and count when you want to find the last datapoints in the past, but you do not know when exactly.
    • add more options, such as timestep to subsample data, or boundaries to fetch one point before and after the time range, or pagination options such as skip, gskip, gcount, or activity tracking options such as active.after or quiet.after FETCH will check if the specified combination is valid.

FETCH always return a list of GTS. List could be empty.

'DataLakeR' 'Rt' STORE //store read token //Fetch every datapoints in the past (MAXLONG the greatest positive value possible) { 'token' $Rt 'class' 'Zmagneticflux' 'labels' { 'id' '~sensor2[1-2]' } 'end' NOW // timestamp or ISO8601 string 'count' MAXLONG } FETCH //Fetch only the last 100 points { 'token' $Rt 'class' 'Zmagneticflux' 'labels' { 'id' 'sensor8' } 'end' NOW // timestamp or ISO8601 string 'count' 100 } FETCH //Fetch the last minute of data, from now { 'token' $Rt 'class' 'Zmagneticflux' 'labels' { 'id' '~sensor2.' } 'end' NOW 'timespan' 1 m } FETCH // Fetch from minute 6 minute to 8 minute 30s. { 'token' $Rt 'class' 'Zmagneticflux' 'labels' { 'id' 'sensor42' } 'start' '1970-01-01T00:06:00.0Z' 'end' '1970-01-01T00:08:30.0Z' } FETCH
  • FETCH always start from the end and walk back in time, finding records in the database. But there is no guarantee that the output is sorted. SORT function will help.
  • FETCH can also take a list as input. This signature is still supported, but is less readable and do not allow advanced FETCH options

GTS attributes

To read only GTS attributes, the fastest way is to use FIND function, which only ask directory component.

'DataLakeR' 'Rt' STORE { 'token' $Rt 'class' 'testGTS' 'labels' {} } FIND

You should have one GTS with an attribute named 'attr' from a precedent example. Use SETATTRIBUTES to change, add, erase attributes, then META to write attributes in the Warp 10 storage.

'DataLakeR' 'Rt' STORE 'DataLakeW' 'Wt' STORE { 'token' $Rt 'class' 'testGTS' 'labels' {} } FIND 0 GET { 'tutorial' 'MasterGTS' } SETATTRIBUTES $Wt META

If you run the previous example again, you should now have a new attribute.

Note that the input map parameters are the same between FIND and FETCH. You can replace FETCH by FIND in your code when you need to debug, the FETCH parameters will be ignored by FIND.

By default, META erases the existing attributes and replace them by those provided. You can setup Warp 10 to accept differential meta update with the ingress.attributes.allowdelta = true option. Once done, you can use METADIFF to add or update attributes without removing the existing ones.

Explore your database

Directory index the GTS in the Warp 10 storage. If you want to build an interactive GUI, use FINDSETS function. For a selection, it returns every possible attribute, labels, classnames. When the final user select a label, a classname, or an attribute, you can quickly update the other drop-down menu elements.

'DataLakeR' 'Rt' STORE [ $Rt '~.*' {} ] FINDSETS //find everything in the token scope

On a distributed version, FINDSTATS extract cardinality statistic which can be very usefull when working with Warp 10 experts.


We also talk about GTS on the blog.


Key points

  • GTS is unique in the storage if its classname and labels values are unique
  • Token content add some labels
  • For your security, you cannot UPDATE a GTS if you didn't rename them after a FETCH.
  • Attributes are some sticky notes you can put or remove on a GTS. Attributes do not affect GTS unicity.
  • In a FETCH or a FIND, you can also filter by attributes