This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Re: Efficient and smart storage of time series

2 replies
edmondo1984
Joined: 2011-09-14,
User offline. Last seen 28 weeks 3 days ago.
Dear Tim,
My tipical use case is the following...

I want to store data from time 0 to ten years with four points per year, then from ten years to thirty with one point per year. It is never more than few hundred points. The goal here is to reduce the number of points on which my algorithm works, by changing the sampling interval. Therefore a more complex data structure is needed and i am worried I

 I access the data sequentially most of the cases (but skipping points, like taking point 3,6,9 and so on), and in other cases with random access.
It must be as fast as possible, no space constraints.

thank you for your help
Best regards
------Messaggio originale------
Da: Tim Pigden
A:Edmondo
Cc:scala-user
Oggetto: Re: [scala-user] Efficient and smart storage of time series
Inviato: 10 Gen 2012 11:29

Hi Edmondo
Important questions that would help understand what you want
a) how much data are we talking about
b) how do you process it (sequentially, random search by time interval ...)
c) how space efficient or fast does it really need to be?
d) are you accessing all the values or just sampling
e) what exactly do you mean by low t and high t in
> for low t I want to store very
> frequent data, for higher t I want to store less frequent data.


On 10 January 2012 10:21, Edmondo Porcu <edmondo [dot] porcu [at] gmail [dot] com> wrote:
> Dear all,
> I have the following use case, and I would like to hear your suggestions.
>
> I have to store data in t,y where t is a time instant and y is the value of
> y=f(t)
>
> In a simple case, since my t where equi-distant in time, I could store that
> efficiently in an array.
>
> class Data(values:Array[Double], pointsFrequency:Int) {
>
> final def apply(month:Int) = values(month/pointsFrequency);
>
> }
>
>
> Imagine now I have the following case: for low t I want to store very
> frequent data, for higher t I want to store less frequent data.
>
> I end up in having a complexData
>
> class ComplexData(subdata:IndexedSeq[Data]) {
>
> final def apply(month:Int)
>
> }
>
> What is the best implementation you can imagine ? :)
>
> Best Regards
>
>



--
Tim Pigden
Optrak Distribution Software Limited
+44 (0)1992 517100
http://www.linkedin.com/in/timpigden
http://optrak.com
Optrak Distribution Software Ltd is a limited company registered in
England and Wales.
Company Registration No. 2327613 Registered Offices: Orland House,
Mead Lane, Hertford, SG13 7AT England
This email and any attachments to it may be confidential and are
intended solely for the use of the individual to whom it is addressed.
Any views or opinions expressed are solely those of the author and do
not necessarily represent those of Optrak Distribution Software Ltd.
If you are not the intended recipient of this email, you must neither
take any action based upon its contents, nor copy or show it to
anyone. Please contact the sender if you believe you have received
this email in error.


Inviato da BlackBerry(R) Wireless Handheld
Tim P
Joined: 2011-07-28,
User offline. Last seen 1 year 4 weeks ago.
Re: Efficient and smart storage of time series

Edmondo

if you know which point in the sparse years you want to
sample (e.g. Q1) and you know that in your algorithm, I would have
thought an array with constant time intervals across the whole data
set and interpolated values for the higher end years would be most
efficient - it allows all your accesses to be direct access to a
primitive  array  of doubles - which is undoubtedly as fast as you're
likely to get. In your algorithm you exercise the selectivity for
higher end years.
Or if you want to have the data tell you there is no value (Year 20 Q2
has no data) then simply insert negative number or something like that
and check for code.

Any other structures would likely lead to a degree of indirection as
you decide which of 2 data structures you will pull the data from.

Space is clearly not an issue.

Personally I would question whether the effort of having the higher
years as sparse data - given we're only talking about a few extra
values is worth the coding complexity of treating it differently. but
then I haven't a clue what you're actually doing with it!

A more uniform algorithm and data set might make it easier to
introduce paralellisation or other techniques that might get more
significant speed ups.

On 10 January 2012 13:07, Edmondo Porcu wrote:
> Dear Tim,
> My tipical use case is the following...
>
> I want to store data from time 0 to ten years with four points per year,
> then from ten years to thirty with one point per year. It is never more than
> few hundred points. The goal here is to reduce the number of points on which
> my algorithm works, by changing the sampling interval. Therefore a more
> complex data structure is needed and i am worried I
>
>  I access the data sequentially most of the cases (but skipping points, like
> taking point 3,6,9 and so on), and in other cases with random access.
> It must be as fast as possible, no space constraints.
>
> thank you for your help
> Best regards
> ------Messaggio originale------
> Da: Tim Pigden
> A:Edmondo
> Cc:scala-user
> Oggetto: Re: [scala-user] Efficient and smart storage of time series
> Inviato: 10 Gen 2012 11:29
>
> Hi Edmondo
> Important questions that would help understand what you want
> a) how much data are we talking about
> b) how do you process it (sequentially, random search by time interval ...)
> c) how space efficient or fast does it really need to be?
> d) are you accessing all the values or just sampling
> e) what exactly do you mean by low t and high t in
>> for low t I want to store very
>> frequent data, for higher t I want to store less frequent data.
>
>
> On 10 January 2012 10:21, Edmondo Porcu wrote:
>> Dear all,
>> I have the following use case, and I would like to hear your suggestions.
>>
>> I have to store data in t,y where t is a time instant and y is the value
>> of
>> y=f(t)
>>
>> In a simple case, since my t where equi-distant in time, I could store
>> that
>> efficiently in an array.
>>
>> class Data(values:Array[Double], pointsFrequency:Int) {
>>
>> final def apply(month:Int) = values(month/pointsFrequency);
>>
>> }
>>
>>
>> Imagine now I have the following case: for low t I want to store very
>> frequent data, for higher t I want to store less frequent data.
>>
>> I end up in having a complexData
>>
>> class ComplexData(subdata:IndexedSeq[Data]) {
>>
>> final def apply(month:Int)
>>
>> }
>>
>> What is the best implementation you can imagine ? :)
>>
>> Best Regards
>>
>>
>
>
>
> --

edmondo1984
Joined: 2011-09-14,
User offline. Last seen 28 weeks 3 days ago.
Re: Efficient and smart storage of time series
Dear all,the situation is the following: I am doing some numerical optimization and the optimization algorithm behaves as Iterations* N variables * M functions.
It typically involves computing the Jacobian matrix of a multi-variables function and performing matrix-vector multiplication at each step.
Because of simplicity of storage, we were performing the optimization on a equi-spaced dataset, as described before. As a result, the size of the problem was tipically in the order of 380 functions and 360 variables (30 years one per month), where functions = 20 + variables - 2
We have realized we can easily solve the problem with sufficient accuracy going to 150 variables and therefore 168 functions, dropping the complexity, but we need a smart way to access the variables which are not anymore equidistant.
Thank you for your help
Best Regards







2012/1/10 Tim Pigden <tim [dot] pigden [at] optrak [dot] com>
Edmondo

if you know which point in the sparse years you want to
sample (e.g. Q1) and you know that in your algorithm, I would have
thought an array with constant time intervals across the whole data
set and interpolated values for the higher end years would be most
efficient - it allows all your accesses to be direct access to a
primitive  array  of doubles - which is undoubtedly as fast as you're
likely to get. In your algorithm you exercise the selectivity for
higher end years.
Or if you want to have the data tell you there is no value (Year 20 Q2
has no data) then simply insert negative number or something like that
and check for code.

Any other structures would likely lead to a degree of indirection as
you decide which of 2 data structures you will pull the data from.

Space is clearly not an issue.

Personally I would question whether the effort of having the higher
years as sparse data - given we're only talking about a few extra
values is worth the coding complexity of treating it differently. but
then I haven't a clue what you're actually doing with it!

A more uniform algorithm and data set might make it easier to
introduce paralellisation or other techniques that might get more
significant speed ups.

On 10 January 2012 13:07, Edmondo Porcu <edmondo [dot] porcu [at] gmail [dot] com> wrote:
> Dear Tim,
> My tipical use case is the following...
>
> I want to store data from time 0 to ten years with four points per year,
> then from ten years to thirty with one point per year. It is never more than
> few hundred points. The goal here is to reduce the number of points on which
> my algorithm works, by changing the sampling interval. Therefore a more
> complex data structure is needed and i am worried I
>
>  I access the data sequentially most of the cases (but skipping points, like
> taking point 3,6,9 and so on), and in other cases with random access.
> It must be as fast as possible, no space constraints.
>
> thank you for your help
> Best regards
> ------Messaggio originale------
> Da: Tim Pigden
> A:Edmondo
> Cc:scala-user
> Oggetto: Re: [scala-user] Efficient and smart storage of time series
> Inviato: 10 Gen 2012 11:29
>
> Hi Edmondo
> Important questions that would help understand what you want
> a) how much data are we talking about
> b) how do you process it (sequentially, random search by time interval ...)
> c) how space efficient or fast does it really need to be?
> d) are you accessing all the values or just sampling
> e) what exactly do you mean by low t and high t in
>> for low t I want to store very
>> frequent data, for higher t I want to store less frequent data.
>
>
> On 10 January 2012 10:21, Edmondo Porcu <edmondo [dot] porcu [at] gmail [dot] com> wrote:
>> Dear all,
>> I have the following use case, and I would like to hear your suggestions.
>>
>> I have to store data in t,y where t is a time instant and y is the value
>> of
>> y=f(t)
>>
>> In a simple case, since my t where equi-distant in time, I could store
>> that
>> efficiently in an array.
>>
>> class Data(values:Array[Double], pointsFrequency:Int) {
>>
>> final def apply(month:Int) = values(month/pointsFrequency);
>>
>> }
>>
>>
>> Imagine now I have the following case: for low t I want to store very
>> frequent data, for higher t I want to store less frequent data.
>>
>> I end up in having a complexData
>>
>> class ComplexData(subdata:IndexedSeq[Data]) {
>>
>> final def apply(month:Int)
>>
>> }
>>
>> What is the best implementation you can imagine ? :)
>>
>> Best Regards
>>
>>
>
>
>
> --

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland