There are 15 messages in this thread.
You are currently looking at messages 10 to 15.
rickman <g...@gmail.com> wrote: < I understand perfectly your original calculations. They are not < complex to understand. However, they are much more complex than < required. That is why I made my post explaining how in the infinite < series, your approach approximates the simple sum of the input < values. The only difference is that your approach can only produce < (N-1) output values for N input values and as a consequence, the final < output will not include the area of one column because you are always < subtracting out the first one. < Your approach errs in the thinking that the area is defined by the < points "surrounding" the area. What it is ignoring that the points < are *included* in the area. I suppose that initial int(0) could be < accounting for this somehow, without specifying how that < initialization is to be done. This is confusing. The points have zero width and integrate to zero. Left out is the actual limit for the integral. If it includes one half a sample period before the first point, and one half after the last point, then just add. If it doesn't, then you need to divide the first and last point by two. < I made a typo in my original equation which is equivalent to your more < complex calculation. < int(n) = int(n-1) + 0.5 * (f(n) - f(n-1)) < should have been < int(n) = int(n-1) + 0.5 * (f(n) + f(n-1)) < If you need me to, I can show you in a step wise manner how this is < equivalent to your equation, < int(n) = int(n-1) + 0.5 * diff(f(n), f(n-1)) + min(f(n), f(n-1)) < Other than the end points of a series, your equation adds in half of < each data point at two separate times. So each point is summed to < produce the integral. Your calculation simply omits half of each end < point. < The mistake that is often made when dealing with discrete time samples < is thinking that they are the same as the instantaneous values of a < continuous function. In reality they are already integrals of the < amplitude and the sample period (1/f). That is why you only need to < sum them to obtain the integral over a series. For the sampling theorem to work they must be samples of the instantaneous value. Otherwise the result is filtered and the results will be wrong when used as filter input (or almost any other use.) < A simple sum is the correct way to calculate the integral of a < discrete time data series. If you can manage the sample points in the center of the bins, yes. Certainly that is preferred, but not always possible. -- glen______________________________
On Jul 6, 1:33 am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote: > rickman <gnu...@gmail.com> wrote: > > < I understand perfectly your original calculations. They are not > < complex to understand. However, they are much more complex than > < required. That is why I made my post explaining how in the infinite > < series, your approach approximates the simple sum of the input > < values. The only difference is that your approach can only produce > < (N-1) output values for N input values and as a consequence, the final > < output will not include the area of one column because you are always > < subtracting out the first one. > > < Your approach errs in the thinking that the area is defined by the > < points "surrounding" the area. What it is ignoring that the points > < are *included* in the area. I suppose that initial int(0) could be > < accounting for this somehow, without specifying how that > < initialization is to be done. > > This is confusing. The points have zero width and integrate > to zero. Left out is the actual limit for the integral. If it > includes one half a sample period before the first point, and > one half after the last point, then just add. If it doesn't, > then you need to divide the first and last point by two. > > < I made a typo in my original equation which is equivalent to your more > < complex calculation. > > < int(n) = int(n-1) + 0.5 * (f(n) - f(n-1)) > > < should have been > > < int(n) = int(n-1) + 0.5 * (f(n) + f(n-1)) > > < If you need me to, I can show you in a step wise manner how this is > < equivalent to your equation, > > < int(n) = int(n-1) + 0.5 * diff(f(n), f(n-1)) + min(f(n), f(n-1)) > > < Other than the end points of a series, your equation adds in half of > < each data point at two separate times. So each point is summed to > < produce the integral. Your calculation simply omits half of each end > < point. > > < The mistake that is often made when dealing with discrete time samples > < is thinking that they are the same as the instantaneous values of a > < continuous function. In reality they are already integrals of the > < amplitude and the sample period (1/f). That is why you only need to > < sum them to obtain the integral over a series. > > For the sampling theorem to work they must be samples of the > instantaneous value. Otherwise the result is filtered and > the results will be wrong when used as filter input (or almost > any other use.) > > < A simple sum is the correct way to calculate the integral of a > < discrete time data series. > > If you can manage the sample points in the center of the bins, yes. > Certainly that is preferred, but not always possible. > > -- glen Glen, What you have said is a self-contradiction. If there is no averaging at all (an impossibility - all measurements are made within a finite time window) and the point measures a value at an exact point it time, then there is no bin to be in the center of or at the edge of. They are just measurements of a point in time. What would define the bin? In fact, what *IS* a bin in this context? It is a matter of perspective which I believe is shown clearly in the calculations. As I showed above (or I tried to show) the two calculations performed on an infinite time series are exactly equivalent and when performed on a finite time series the only difference is that one method includes all the samples with a multiplier of 1 and the other uses a multiplier of 0.5 only on the end points. The distinction of looking at the "bins" as being around the sample or between the samples is a red-herring and have no relation to the real math of integrating a time sampled signal. | | | |--+--| | | |--+--| * | | |--+--| * | * | |--+--| * | * | * | | * | * | * | * | | * | * | * | * | t0 t1 t2 t3 3 4 5 6 In this example, the + indicate the value of the signal f(t) at that point in time. The integral calculation assumes that the value of f (t) between the time samples is a straight line. By treating the period as "1" and summing the values, you are in effect calculating the areas shown above with the point centered in the rectangles. The area on the left and right of the point between the top of the rectangle and the straight line approximation have opposite signs since one is making the area larger than it should be and other is making is smaller. But they are equal in value and cancel. The other method draws rectangles between the points with a height equal to the average of each two points. This is a much more complex calculation and gives the same result. There is no reason to complicate the calculations using this calculation. Neither calculation has a basis in the physics of the problem. They are just equivalent numerical approximations. Rick______________________________
On Jul 6, 1:25=A0am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote: > rickman <gnu...@gmail.com> wrote: > > (snip, someone wrote) > > <>http://sunnyeves.blogspot.com/ > > < That calculation looks complex, but isn't it really just > > < int(n) =3D int(n-1) + 0.5 * (f(n) - f(n-1)) > > < The 0.5 times the difference assumes that the function is a straight > < line between the two end points and when added to the min value is > < just the average of the two points. =A0This is an approximation, but > < depending you your needs will be adequate. > > < I would argue that for an arbitrary function, there is no advantage to > < using the average of each two points over just summing the points. > < Consider points 0 to N where N is a large number. > > < N > < < f(n) =3D f(1) + f(2) + ... + f(N) > < < > < 1 > > < N > < < avg(f(n),f(n-1) =3D 0.5 f(0) + f(1) + f(2) + ... + 0.5 * f(N) > < < > < 1 > > < Notice that the only difference is that the average needs an extra > < input point to calculate the first average and that the two end points > < of the summation are halved. =A0Numerically the difference between the > < two calculations is 0.5 * (f(n) - f(0)). =A0It appears to me to be a > < very minuscule error to just add all the points without the complexity > < of averaging. =A0I would bet that for any value of N, 256 or over, this > < error in the integral is much less than the error you get by the > < original straight line average approximation. > > Sure, but for small N it might be important. Actually, the difference is that for a finite series, one approximates the integral of the curve at N points and the other approximates the integral of N-1 points. This is easy to see if you consider the width of the area being approximated. For a constant value using two points, the simple sum is twice the value of a single point and three points gives a value half again as large as two points. Using the average method you can't calculate a value for one point and the value for three points is *twice* the value for two points. That's because the area being approximated using three points is only two periods wide, not three. Time sampled series are not continuous functions and can not be analyzed the same way. Rick
rickman <g...@gmail.com> wrote: (snip, I wrote) <> This is confusing. The points have zero width and integrate <> to zero. Left out is the actual limit for the integral. If it <> includes one half a sample period before the first point, and <> one half after the last point, then just add. If it doesn't, <> then you need to divide the first and last point by two. (after you wrote) <> < The mistake that is often made when dealing with discrete time samples <> < is thinking that they are the same as the instantaneous values of a <> < continuous function. In reality they are already integrals of the <> < amplitude and the sample period (1/f). That is why you only need to <> < sum them to obtain the integral over a series. <> For the sampling theorem to work they must be samples of the <> instantaneous value. Otherwise the result is filtered and <> the results will be wrong when used as filter input (or almost <> any other use.) (snip) < What you have said is a self-contradiction. If there is no averaging < at all (an impossibility - all measurements are made within a finite < time window) and the point measures a value at an exact point it time, < then there is no bin to be in the center of or at the edge of. They < are just measurements of a point in time. What would define the bin? < In fact, what *IS* a bin in this context? I read both comp.dsp and comp.arch.fpga, and sometimes there is overlap between them. An important part of digital signal processing is the sampling theorem which, briefly, (and ignoring non-baseband signals) states that a signal sampled at more than twice its highest frequency component can be reconstructed from those samples. For any problem involving sampled signals, the assumption, then, is that it was properly filtered before sampling. As integration is a averaging, or low pass filtering, process it tends to reduce any error in the input data, including sampling error. at twice its < It is a matter of perspective which I believe is shown clearly in the < calculations. As I showed above (or I tried to show) the two < calculations performed on an infinite time series are exactly < equivalent and when performed on a finite time series the only < difference is that one method includes all the samples with a < multiplier of 1 and the other uses a multiplier of 0.5 only on the end < points. Yes. If you add up (n+1) points and want the average value you should divide by (n+1) not n. Consider the case where all sample points have the same value. If the length of the continuous integral is n, adding up (n+1) points will give a result (n+1)/n too large. On the other hand, the (n+1) points could be over an interval of length (n+2). If the length is n the sum of the weights of the points should be n. (That is, for at least a 0th order approximation.) If the length is n, one solution is to multiply the first and last points by 0.5, another is to multiply all points by n/(n+1). < The distinction of looking at the "bins" as being around the sample or < between the samples is a red-herring and have no relation to the real < math of integrating a time sampled signal. < | | | |--+--| < | | |--+--| * | < | |--+--| * | * | < |--+--| * | * | * | < | * | * | * | * | < | * | * | * | * | < t0 t1 t2 t3 < 3 4 5 6 < In this example, the + indicate the value of the signal f(t) at that < point in time. The integral calculation assumes that the value of f < (t) between the time samples is a straight line. By treating the < period as "1" and summing the values, you are in effect calculating < the areas shown above with the point centered in the rectangles. The < area on the left and right of the point between the top of the < rectangle and the straight line approximation have opposite signs < since one is making the area larger than it should be and other is < making is smaller. But they are equal in value and cancel. I agree. What is important is the total length of the region being integrated. That could be n, n+1, n+2, or almost anything else. (For uniform sampling, hopefully not too far from one of those.) < The other method draws rectangles between the points with a height < equal to the average of each two points. This is a much more complex < calculation and gives the same result. There is no reason to < complicate the calculations using this calculation. Neither < calculation has a basis in the physics of the problem. They are just < equivalent numerical approximations. If the actual length of the integral isn't too important, I 100% agree. If it is somewhat important, as it often is for not so large n, then I don't agree. Numerical integration routines are often ranked on the highest order polynomial they give an exact result for. (Even though it is rare to actually numerically integrate polynomials.) If you don't even get zero order, that isn't good. Still, in the large n limit it isn't far off. -- glen
rickman <g...@gmail.com> wrote: (snip, someone wrote) <> < int(n) = int(n-1) + 0.5 * (f(n) - f(n-1)) (snip) < Actually, the difference is that for a finite series, one approximates < the integral of the curve at N points and the other approximates the < integral of N-1 points. I would call it an integral of length N or N-1, but yes. < This is easy to see if you consider the width < of the area being approximated. For a constant value using two < points, the simple sum is twice the value of a single point and three < points gives a value half again as large as two points. Using the < average method you can't calculate a value for one point and the value < for three points is *twice* the value for two points. That's because < the area being approximated using three points is only two periods < wide, not three. < Time sampled series are not continuous functions and can not be < analyzed the same way. Sampled series are often approximations of continuous functions. For band limited continuous functions the sampled series can exactly represent the continuous function. (That is, sampled by not quantized.) As I said previously (before reading your post), it is usual to indicate the highest order polynomial that a numerical integration routine gives an exact answer for. (Ignoring any rounding or quantization errors.) Within certain constraints, time sampled series can be analyzed in a way comparable to coninuous functions. -- glen______________________________