- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers

# Re: Efficient and smart storage of time series

Tue, 2012-01-10, 14:07

Dear Tim,

My tipical use case is the following...

I want to store data from time 0 to ten years with four points per year, then from ten years to thirty with one point per year. It is never more than few hundred points. The goal here is to reduce the number of points on which my algorithm works, by changing the sampling interval. Therefore a more complex data structure is needed and i am worried I

I access the data sequentially most of the cases (but skipping points, like taking point 3,6,9 and so on), and in other cases with random access.

It must be as fast as possible, no space constraints.

thank you for your help

Best regards

------Messaggio originale------

Da: Tim Pigden

A:Edmondo

Cc:scala-user

Oggetto: Re: [scala-user] Efficient and smart storage of time series

Inviato: 10 Gen 2012 11:29

Hi Edmondo

Important questions that would help understand what you want

a) how much data are we talking about

b) how do you process it (sequentially, random search by time interval ...)

c) how space efficient or fast does it really need to be?

d) are you accessing all the values or just sampling

e) what exactly do you mean by low t and high t in

> for low t I want to store very

> frequent data, for higher t I want to store less frequent data.

On 10 January 2012 10:21, Edmondo Porcu <edmondo [dot] porcu [at] gmail [dot] com> wrote:

> Dear all,

> I have the following use case, and I would like to hear your suggestions.

>

> I have to store data in t,y where t is a time instant and y is the value of

> y=f(t)

>

> In a simple case, since my t where equi-distant in time, I could store that

> efficiently in an array.

>

> class Data(values:Array[Double], pointsFrequency:Int) {

>

> final def apply(month:Int) = values(month/pointsFrequency);

>

> }

>

>

> Imagine now I have the following case: for low t I want to store very

> frequent data, for higher t I want to store less frequent data.

>

> I end up in having a complexData

>

> class ComplexData(subdata:IndexedSeq[Data]) {

>

> final def apply(month:Int)

>

> }

>

> What is the best implementation you can imagine ? :)

>

> Best Regards

>

>

--

Tim Pigden

Optrak Distribution Software Limited

+44 (0)1992 517100

http://www.linkedin.com/in/timpigden

http://optrak.com

Optrak Distribution Software Ltd is a limited company registered in

England and Wales.

Company Registration No. 2327613 Registered Offices: Orland House,

Mead Lane, Hertford, SG13 7AT England

This email and any attachments to it may be confidential and are

intended solely for the use of the individual to whom it is addressed.

Any views or opinions expressed are solely those of the author and do

not necessarily represent those of Optrak Distribution Software Ltd.

If you are not the intended recipient of this email, you must neither

take any action based upon its contents, nor copy or show it to

anyone. Please contact the sender if you believe you have received

this email in error.

Inviato da BlackBerry(R) Wireless Handheld

My tipical use case is the following...

I want to store data from time 0 to ten years with four points per year, then from ten years to thirty with one point per year. It is never more than few hundred points. The goal here is to reduce the number of points on which my algorithm works, by changing the sampling interval. Therefore a more complex data structure is needed and i am worried I

I access the data sequentially most of the cases (but skipping points, like taking point 3,6,9 and so on), and in other cases with random access.

It must be as fast as possible, no space constraints.

thank you for your help

Best regards

------Messaggio originale------

Da: Tim Pigden

A:Edmondo

Cc:scala-user

Oggetto: Re: [scala-user] Efficient and smart storage of time series

Inviato: 10 Gen 2012 11:29

Hi Edmondo

Important questions that would help understand what you want

a) how much data are we talking about

b) how do you process it (sequentially, random search by time interval ...)

c) how space efficient or fast does it really need to be?

d) are you accessing all the values or just sampling

e) what exactly do you mean by low t and high t in

> for low t I want to store very

> frequent data, for higher t I want to store less frequent data.

On 10 January 2012 10:21, Edmondo Porcu <edmondo [dot] porcu [at] gmail [dot] com> wrote:

> Dear all,

> I have the following use case, and I would like to hear your suggestions.

>

> I have to store data in t,y where t is a time instant and y is the value of

> y=f(t)

>

> In a simple case, since my t where equi-distant in time, I could store that

> efficiently in an array.

>

> class Data(values:Array[Double], pointsFrequency:Int) {

>

> final def apply(month:Int) = values(month/pointsFrequency);

>

> }

>

>

> Imagine now I have the following case: for low t I want to store very

> frequent data, for higher t I want to store less frequent data.

>

> I end up in having a complexData

>

> class ComplexData(subdata:IndexedSeq[Data]) {

>

> final def apply(month:Int)

>

> }

>

> What is the best implementation you can imagine ? :)

>

> Best Regards

>

>

--

Tim Pigden

Optrak Distribution Software Limited

+44 (0)1992 517100

http://www.linkedin.com/in/timpigden

http://optrak.com

Optrak Distribution Software Ltd is a limited company registered in

England and Wales.

Company Registration No. 2327613 Registered Offices: Orland House,

Mead Lane, Hertford, SG13 7AT England

This email and any attachments to it may be confidential and are

intended solely for the use of the individual to whom it is addressed.

Any views or opinions expressed are solely those of the author and do

not necessarily represent those of Optrak Distribution Software Ltd.

If you are not the intended recipient of this email, you must neither

take any action based upon its contents, nor copy or show it to

anyone. Please contact the sender if you believe you have received

this email in error.

Inviato da BlackBerry(R) Wireless Handheld

Tue, 2012-01-10, 14:31

#2
Re: Efficient and smart storage of time series

Dear all,the situation is the following: I am doing some numerical optimization and the optimization algorithm behaves as Iterations* N variables * M functions.

It typically involves computing the Jacobian matrix of a multi-variables function and performing matrix-vector multiplication at each step.

Because of simplicity of storage, we were performing the optimization on a equi-spaced dataset, as described before. As a result, the size of the problem was tipically in the order of 380 functions and 360 variables (30 years one per month), where functions = 20 + variables - 2

We have realized we can easily solve the problem with sufficient accuracy going to 150 variables and therefore 168 functions, dropping the complexity, but we need a smart way to access the variables which are not anymore equidistant.

Thank you for your help

Best Regards

2012/1/10 Tim Pigden <tim [dot] pigden [at] optrak [dot] com>

It typically involves computing the Jacobian matrix of a multi-variables function and performing matrix-vector multiplication at each step.

Because of simplicity of storage, we were performing the optimization on a equi-spaced dataset, as described before. As a result, the size of the problem was tipically in the order of 380 functions and 360 variables (30 years one per month), where functions = 20 + variables - 2

We have realized we can easily solve the problem with sufficient accuracy going to 150 variables and therefore 168 functions, dropping the complexity, but we need a smart way to access the variables which are not anymore equidistant.

Thank you for your help

Best Regards

2012/1/10 Tim Pigden <tim [dot] pigden [at] optrak [dot] com>

Edmondo

if you know which point in the sparse years you want to

sample (e.g. Q1) and you know that in your algorithm, I would have

thought an array with constant time intervals across the whole data

set and interpolated values for the higher end years would be most

efficient - it allows all your accesses to be direct access to a

primitive array of doubles - which is undoubtedly as fast as you're

likely to get. In your algorithm you exercise the selectivity for

higher end years.

Or if you want to have the data tell you there is no value (Year 20 Q2

has no data) then simply insert negative number or something like that

and check for code.

Any other structures would likely lead to a degree of indirection as

you decide which of 2 data structures you will pull the data from.

Space is clearly not an issue.

Personally I would question whether the effort of having the higher

years as sparse data - given we're only talking about a few extra

values is worth the coding complexity of treating it differently. but

then I haven't a clue what you're actually doing with it!

A more uniform algorithm and data set might make it easier to

introduce paralellisation or other techniques that might get more

significant speed ups.

On 10 January 2012 13:07, Edmondo Porcu <edmondo [dot] porcu [at] gmail [dot] com> wrote:

> Dear Tim,

> My tipical use case is the following...

>

> I want to store data from time 0 to ten years with four points per year,

> then from ten years to thirty with one point per year. It is never more than

> few hundred points. The goal here is to reduce the number of points on which

> my algorithm works, by changing the sampling interval. Therefore a more

> complex data structure is needed and i am worried I

>

> I access the data sequentially most of the cases (but skipping points, like

> taking point 3,6,9 and so on), and in other cases with random access.

> It must be as fast as possible, no space constraints.

>

> thank you for your help

> Best regards

> ------Messaggio originale------

> Da: Tim Pigden

> A:Edmondo

> Cc:scala-user

> Oggetto: Re: [scala-user] Efficient and smart storage of time series

> Inviato: 10 Gen 2012 11:29

>

> Hi Edmondo

> Important questions that would help understand what you want

> a) how much data are we talking about

> b) how do you process it (sequentially, random search by time interval ...)

> c) how space efficient or fast does it really need to be?

> d) are you accessing all the values or just sampling

> e) what exactly do you mean by low t and high t in

>> for low t I want to store very

>> frequent data, for higher t I want to store less frequent data.

>

>

> On 10 January 2012 10:21, Edmondo Porcu <edmondo [dot] porcu [at] gmail [dot] com> wrote:

>> Dear all,

>> I have the following use case, and I would like to hear your suggestions.

>>

>> I have to store data in t,y where t is a time instant and y is the value

>> of

>> y=f(t)

>>

>> In a simple case, since my t where equi-distant in time, I could store

>> that

>> efficiently in an array.

>>

>> class Data(values:Array[Double], pointsFrequency:Int) {

>>

>> final def apply(month:Int) = values(month/pointsFrequency);

>>

>> }

>>

>>

>> Imagine now I have the following case: for low t I want to store very

>> frequent data, for higher t I want to store less frequent data.

>>

>> I end up in having a complexData

>>

>> class ComplexData(subdata:IndexedSeq[Data]) {

>>

>> final def apply(month:Int)

>>

>> }

>>

>> What is the best implementation you can imagine ? :)

>>

>> Best Regards

>>

>>

>

>

>

> --

Edmondo

if you know which point in the sparse years you want to

sample (e.g. Q1) and you know that in your algorithm, I would have

thought an array with constant time intervals across the whole data

set and interpolated values for the higher end years would be most

efficient - it allows all your accesses to be direct access to a

primitive array of doubles - which is undoubtedly as fast as you're

likely to get. In your algorithm you exercise the selectivity for

higher end years.

Or if you want to have the data tell you there is no value (Year 20 Q2

has no data) then simply insert negative number or something like that

and check for code.

Any other structures would likely lead to a degree of indirection as

you decide which of 2 data structures you will pull the data from.

Space is clearly not an issue.

Personally I would question whether the effort of having the higher

years as sparse data - given we're only talking about a few extra

values is worth the coding complexity of treating it differently. but

then I haven't a clue what you're actually doing with it!

A more uniform algorithm and data set might make it easier to

introduce paralellisation or other techniques that might get more

significant speed ups.

On 10 January 2012 13:07, Edmondo Porcu wrote:

> Dear Tim,

> My tipical use case is the following...

>

> I want to store data from time 0 to ten years with four points per year,

> then from ten years to thirty with one point per year. It is never more than

> few hundred points. The goal here is to reduce the number of points on which

> my algorithm works, by changing the sampling interval. Therefore a more

> complex data structure is needed and i am worried I

>

> I access the data sequentially most of the cases (but skipping points, like

> taking point 3,6,9 and so on), and in other cases with random access.

> It must be as fast as possible, no space constraints.

>

> thank you for your help

> Best regards

> ------Messaggio originale------

> Da: Tim Pigden

> A:Edmondo

> Cc:scala-user

> Oggetto: Re: [scala-user] Efficient and smart storage of time series

> Inviato: 10 Gen 2012 11:29

>

> Hi Edmondo

> Important questions that would help understand what you want

> a) how much data are we talking about

> b) how do you process it (sequentially, random search by time interval ...)

> c) how space efficient or fast does it really need to be?

> d) are you accessing all the values or just sampling

> e) what exactly do you mean by low t and high t in

>> for low t I want to store very

>> frequent data, for higher t I want to store less frequent data.

>

>

> On 10 January 2012 10:21, Edmondo Porcu wrote:

>> Dear all,

>> I have the following use case, and I would like to hear your suggestions.

>>

>> I have to store data in t,y where t is a time instant and y is the value

>> of

>> y=f(t)

>>

>> In a simple case, since my t where equi-distant in time, I could store

>> that

>> efficiently in an array.

>>

>> class Data(values:Array[Double], pointsFrequency:Int) {

>>

>> final def apply(month:Int) = values(month/pointsFrequency);

>>

>> }

>>

>>

>> Imagine now I have the following case: for low t I want to store very

>> frequent data, for higher t I want to store less frequent data.

>>

>> I end up in having a complexData

>>

>> class ComplexData(subdata:IndexedSeq[Data]) {

>>

>> final def apply(month:Int)

>>

>> }

>>

>> What is the best implementation you can imagine ? :)

>>

>> Best Regards

>>

>>

>

>

>

> --