High Frequency Finance

From LiteratePrograms
Jump to: navigation, search

Investment strategies are called "high frequency" when is uses the whole available data rather than sampled ones. Of course this means that intra day data are used (opposite is extra day data).

[edit] Where to find data

Intra day (high freq) data from EuroNext

If extra day data are easily downloadable on the internet (see for instance yahoo finance), it is far more difficult to find intra day data. Some exchanges put the data of the day downloadable on their web site, like here EuroNext.

<<read_euronext>>=
row_data = urlread( [ 'http://www.euronext.com/tools/datacentre/' ...
    'dataCentreDownloadExcell.jcsv?' ...
    'quote=on&volume=on&lan=EN&cha=2593&time=on&' ...
    'selectedMep=1&isinCode=FR0000133308&typeDownload=1&' ...
    'format=xls&indexCompo=']);

parsed_data = regexp( row_data, [ '(?<date>[0-9][0-9]:[0-9][0-9]:[0-9][0-9])' ...
    '[^0-9]+(?<id>[0-9]+)[^0-9]+(?<price>[0-9]+.[0-9]+)[^0-9]+(?<volume>[0-9]+)' ], 'names')

data = struct('title', 'France Telecom the 17th of July', ...
    'value', [cellfun(@(x)str2num(x), {parsed_data.id})', ...
    cellfun(@(x)str2num(x), {parsed_data.price})', cellfun(@(x)str2num(x), {parsed_data.volume})'], ...
    'date', datenum( {parsed_data.date}, 'HH:MM:SS'), ...
    'colnames', {{ 'id', 'Price', 'Volume'}})

<<plot_data>>=
dts = mod(data.date,1)/datenum(0,0,0, 1,0,0);
figure; 
a1=subplot(4,1,1:3);
plot( dts, data.value(:,2), '.k');
ax = axis; axis([min(dts) max(dts) ax(3:4)]);
ylabel(data.colnames{2});
title( data.title)
a2=subplot(4,1,4);
stem( dts, data.value(:,3),'.','linewidth',2);
ylabel('Volume')
linkaxes([a1,a2],'x')

This MATLAB codeblock shows how to read high freq data from the EuroNext web site and store them is a structured data format.

[edit] Volatility estimation

Uniformly (10min) sampled data

To capture the complexity of intra day volatility estimation, we first need some code to sample data according to any chosen frequency:

<<sampling>>=
[data.date, idx] = sort(data.date);
data.value = data.value(idx,:);
dts = mod(data.date,1)/datenum(0,0,0, 1,0,0);
dt_step = 1/6;
sample = dts(1):dt_step:dts(end);
idx_from = arrayfun(@(d)find(dts>d,1),sample);
figure;
plot( dts, data.value(:,2), '.k');
ax = axis; axis([min(dts) max(dts) ax(3:4)]);
ylabel(data.colnames{2});
title( data.title)
hold on
stem( sample, data.value(idx_from,2), 'or');
hold off

[edit] Changing the sampling rate

Volatility estimates are related to sampling frequency

The important point is to observe that the volatility estimate goes to infinity when we use more and more data (decrease the time step):

  • the less number of point we use and the more variance we have on the estimator (right part)
  • when we use more data, the noise decrease
  • but when we really want to use all points (left part) the volatility increase rapidly

This is a well known theoretical result (see Jacod, Mykland, Ait Sahalia, Zhang and others).

<<volatility_estimated>>=
f_sampled=@(dt_step)data.value(arrayfun(@(d)find(dts>d,1),dts(1):dt_step:dts(end)),2);
sampling = [1:240]/60;
volatilities = repmat(nan, length(sampling),1);
for s=1:length(sampling)
    S = f_sampled( sampling(s));
    volatilities(s)=100*std( diff(S)./S(2:end))/sqrt(sampling(s))*sqrt(8.5*256);
end
figure;
plot( sampling, volatilities, 'k', sampling, volatilities, '.k')
xlabel('Time step (hours)');
ylabel('Volatility (empirical)');
title( data.title)
Download code
hijacker
hijacker
hijacker
hijacker