TimeSeries and TimeSeries Collections (Matlab)

From LiteratePrograms
Jump to: navigation, search

MATLAB offers two new object since release R14SP3 (mid 2005):

  • timeseries
  • tscollection

They implement a way to use structured datasets.


[edit] Core objects

[edit] TimeSeries

A timeseries is at least defined by:

  • a name
  • a date vector
  • a column of values
Plot of a MATLAB time-series

For instance:

ts1 = timeseries(cumsum(randn(100,1)),(1:100)','name','my first dataset');

To be able to plot such an object, we can use the polymorphic plot function with usual options:

plot(ts1, 'linewidth',2);

Some properties of the timeserie are very simple to access as:

  • the date vector, available through |ts1.time|
  • the vector of values, through |ts1.data|
  • the name, through |ts1.name|

The more generic |get| function allow access to more properties:

>> get(ts1)
             Events: []                       
               Name: 'first'                  
               Data: [100x1 double]           
           DataInfo: [1x1 tsdata.datametadata]
               Time: [100x1 double]           
           TimeInfo: [1x1 tsdata.timemetadata]
            Quality: []                       
        QualityInfo: [1x1 tsdata.qualmetadata]
        IsTimeFirst: true                     
  TreatNaNasMissing: true

[edit] TimeSeries Events

A special field called |Events| can be used to store sparse informations about the timeserie.

[edit] TSCollection

A TScollection is a collection of synchronized timeseries. It has its own name.

Plot of a MATLAB TSCollection using tsc_plot

To build a TSCollection, you only need some synchronized timeseries, like in the following codeblock:

ts1 = timeseries(cumsum(randn(100,1)),   (1:100)','name','one');
ts2 = timeseries(cumsum(randn(100,1)*.5),(1:100)','name','two');
tsc = tscollection({ts1, ts2}, 'Name', 'my first TSCollection');

Unfortunately, the plot function does not work on TSCollections, so we need to write one plotting function:

function h = tsc_plot( tsc, varargin)
% TSC_PLOT - plot TSCollection
% example:
%  tsc_plot(tsc, 'linewidth',2)
h = figure;
names  = gettimeseriesnames(tsc);
colors = 'brmckyg';
for n=1:length(names)
   ts = tsc.(names{n});
   plot(tsc.time, ts.data, colors(mod(n,length(colors))), varargin{:});
   hold on
hold off
legend(gca, names);

[edit] TimeSeries names into a TSCollection

Once timeseries are put into a TSCollection object, its names are translated in a kind of hexadecimal convertion of non std ascii characters. For instance:

>> tscollection(timeseries((1:10)',(1:10)','name','anycharaters(\_)'))
Time Series Collection Object: unnamed
Time vector characteristics
     Start time            1 seconds
     End time              10 seconds
Member Time Series Objects:

This is clearly a problem to be able to retrieve your timeseries. It's possible to build a function implementing its translation, and to use it to retrieve timeseries with their orignal names:

function z = translate4tsc(op, s)
% TRANSLMATE4TSC - translation in to directions:
%  'anycharaters0x280x5C_0x29' = translate4tsc('to-tsc',  'anycharaters(\_)')
%  'anycharaters(\_)'          = translate4tsc('from-tsc', 'anycharaters0x280x5C_0x29')
switch lower(op)
    case {'20xhex', 'to-tsc'}
        %<* convert to hex
        code2keep = [48:57,65:90,97:122,95];
        t     = double(s);
        t     = t(:);
        ikeep = ismember(t,code2keep);
        iconv = ~ikeep;
        if all(ikeep)
            z = s;
        z = repmat(' ',4,length(s));
        tmp = dec2hex([t(iconv);100]);
        z(3:4,iconv) = tmp(1:end-1,:)';
        z(1,iconv) = '0';
        z(2,iconv) = 'x';
        z(1,ikeep) = s(ikeep);
        z = strrep(z(:)',' ','');
    case {'0x2str', 'from-tsc'}
        %<* convert 0x hex to string
        idx = strfind(s, '0x');
        if isempty(idx)
            z = s;
        h = char(hex2dec(s([idx(:)+2,idx(:)+3])));
        s(idx(:)+3) = h;
        s = strrep(s, '___', '');
        z = s;
        error('translate4tsc:InvalidMode','Invalid mode <%s>',op);     

This function is not perfect at this stage (problem with spaces into names), so feel free to upgrade it.

[edit] Functions

[edit] Simple manipulations

[edit] TimeSeries

  • getqualitydesc
  • getdatasamplesize
  • Sample manipulations
    • addsample
    • delsample
  • ctranspose
  • detrend
  • filter

[edit] TSCollection

  • TimeSeries manipulations
    • addts
    • removets
  • Sample manipulations
    • addsampletocollection
    • delsamplefromcollection

[edit] More complex operations

[edit] Synchronization

Plot of a MATLAB TSCollection using tsc_plot

Synchronization of timeseries is a critical point. Unfortunately it seems impossible at this stage to synchronize TSCollections. To illustrate this we need to create two timeseries first:

<<create two timeseries>>=
ts2=timeseries(cumsum(randn(51,1)) ,(50:2:150)','name','second');

Then it is possible to try to synchronize them, for instance using the union option:

<<synchronize and plot >>=
[ts1s, ts2s] = synchronize(ts1, ts2,'union');
tsc_plot(tscollection({ts1s, ts2s}), 'linewidth', 2, 'marker', 'o')
hold on; plot(ts1,'.','marker','+','markersize',20)
hold on; plot(ts2,'.r','marker','+','markersize',20)

The plotting options are such that the new values are plotted with o, the old one with the +.

Plot of the synchronization and creation relative performances
Plot of the synchronization prelative performances (Here obtained with MATLAB R14 SP3, with MATLAB 2006a, the first points (100 to 500 points) are around two times fastest (so always far slowest than a self made solution))

The most interesting feature is clearly the synchronization one. Unfortunately, TSCollections cannot be synchronized (it's only available on timeseries) and a self made synchronization is far faster than the MathWorks one. The figure (at left) shows the relative performance (CPU time obtained by tic;toc) of the synchronization of TSCollections of different size versus an equivalent self-made synchronization on simple structures. The figure at right shows the CPU time ratio between built-in and self made synchronizations (blue) and instanciation (green) for some data sizes (size(.,1) on x axe).

The results are clear enough:

  • for instanciation, TSCollection is around 1.500 times slower than a self made structure
  • for synchronization, TSCollection is at leat 60 times slower than a self made one, the ratio for small sizes is very high (around 1.000 times slower), and decreases for largest sizes (around 100 times).


[edit] Self made TimeSeries equivalent

Because of the inefficiency of the timeseries and TScollection objects, we can try to implement our own equivalents.

[edit] Main object

As stated in another article (Swiss army knife MATLAB programs for quantitative finance) we can use a simple structure to store all what wee need:

data = struct('title', 'my TScollection title', 'value', cumsum(randn(100,3)), 'date', (now-100+1:now)', ...
              'names', {{'column1', 'column2', 'column3'}})

Here is a simple function to build such objects:

function data = myTSCobject( varargin)
% MYTSOBJECT - self made efficient TScollection
%  use:
% data = myTSCobject('title', 'my TScollection title', 'value', cumsum(randn(100,3)), ...
%                    'date', (now-100+1:now)', ...
%                    'names', {'column1', 'column2', 'column3'})
data = [];
for f=1:2:length(varargin)-1
   field_name  = varargin{f};
   field_value = varargin{f+1};
   data.(field_name) = field_value;
if ~isfield(data,'value') | ~isfield(data,'date') | ~isfield(data,'names') | ~isfield(data,'title')
   error('myTSCobject:field', 'fields <value> <date> <names> <title> mandatory for myTSCobject');
[nv,pv] = size(data.value);
[nd,pd] = size(data.date);
[nn,pn] = size(data.names);
if nv ~= nd | pv ~= pn | nn ~= 1 | pd ~= 1
   warning('myTSCobject:check', 'problem with dimension of fields');

[edit] Main functions

Here we need at least a synchronization function. We will need an interpolation function, it's amazing that the MATLAB timeseries synchronization function does not use the MATLAB interp1 function: why doing twice what have be done once?

function data0 = mySynchro(data1, data2, varargin)
% MYSYNCHRO - a simple self made synchronization
date0    = union(data1.date, data2.date);
value0_1 = interp1(data1.date, data1.value, date0(:), varargin{:});
value0_2 = interp1(data2.date, data2.value, date0(:), varargin{:});
data0    = myTSCobject('title', 'syncronized dataset', 'date', date0(:), 'value', [value0_1, value0_2], ...
                       'names', {data1.names{:}, data2.names{:}});
plot of data1
plot of data2
plot of data0
plot of data0p

Which can be used like this:

dt1   = (1:10:200)';
data1 = myTSCobject('title','A','value',[sin(dt1/100*pi),cos(dt1/100*pi)], ...
                   'names',{'sin', 'cos'},'date',dt1,'plot_style','points')
dt2   = (10:1:200)';
data2 = myTSCobject('title','B','value',[(dt2/1000).^2,1./dt2], ...
                   'names',{'x2', '1/x'},'date',dt2,'plot_style','points')

data0  = mySynchro(data1,data2)
data0p = mySynchro(data1,data2,'nearest')

As you can see, we can now use all the |interp1| options.

Download code