idkfa // Viewing message 'SPDCA: Having trouble'

Latest Posts

in idkfa (+9Y):

Happy birthday, idkfa! I think this makes you 14 years old.

in Mercy General (+9Y):

[img] via lizinthelibrary.

in Mercy General (+9Y):

Congrats! You're getting into pro uncle territory!

in Mercy General (+9Y):

Yay!!! Congrats to A!

in Mercy General (+9Y):

Sterling Israel, born today at 12:13pm, 7lbs and change, 21 inches. [img]

Search more Stream more

kaiden

SPDCA:

Having trouble sleeping, so I'll write a bit about some developments regarding the adventures of this algorithm.

Amazingly, I had the opportunity to use this at work. It was the case where I needed to design a tool that could consume log messages from another data source. These logs were sequentially numbered (like idkfa posts), were monotonically increasing, and came through at a very, very fast pace (say 80MB in a few seconds).

My log consumer needed to be robust, fast, and in addition to the statistical features it would have from scanning the log messages, it would need to "remember" if it had seen a log message provided that a message was a) accidentally/programmatically repeated, or b) explicitly sent again by an administrator. That meant that I needed a way to accurately track the messages that would be 1) quick to access and 2) take little memory. I decided to take the method I had for idkfa and implement it using Perl.

The "quick" part took about a week and a half to figure out, as I was having to continuously rewrite my code to try to keep up with the data stream. Performing a linear search was far, far too slow, even in the beginning, and would be unusable after only a short period of normal usage. By the time I was done, I was using two Red-Black trees as lower and upper bound indexes (similar to the database index I use on idkfa), as well as a few other tricks made possible by having data structures stored in memory rather than a database.

Unfortunately, memory usage became a killer issue. While my algorithm could keep up and scale well as it "saw" more data, there was never any guarantee that the "gaps" in the log messages would be filled. I was collecting such a huge amount of data that even if I was missing 1 in 1000 messages, within hours my process would run out of memory by being forced to store all of the gaps.

It was at that point I had to make a compromise. There was no way I was going to be able to build more complexity into the tool to track messages, and the probability that I would see a gap be filled grew less and less likely as time went on.

My decision was this: I would only store 1000 bounds. When it came time to add another bound, I would take the "oldest" two bounds (those that I was the least likely to see again), and merge them. This means that though I was introducing inaccuracy into the system, I was introducing it at the point where I was least likely to need accuracy. If I can still be 99% accurate over my configured time period, then I think this was a reasonable compromise.

If idkfa's seen/unseen system bogs down with the sheer quantity of posts we're adding, this may become a viable option. In this case, as you read, you would slowly "auto-read" the oldest posts as soon as you read new posts that couldn't be amended to currently stored boundaries. It would mean the system would be less accurate, but only so for older posts that you probably weren't going to read anyway.

I don't see that happening any time soon. More just making a note for the future.

#3983, posted at 2012-01-17 03:31:45 in idkfa

So... slight problem with idkfa's backend. // kaiden (#843) +15Y ago
- My new objective is to now only view odd // Green Man (#844) +15Y
- - Ha. I'd still be at 50% of the previous // kaiden (#846) +15Y
  - - He can't read your reply. It's an // kitacek (#848) +15Y
- so, I volunteer up front that I know NOTHING // kitacek (#845) +15Y
- - You're right, would cause the same problem // kaiden (#847) +15Y
  - - what if each new message received every // kitacek (#849) +15Y
    - - I'd be needlessly preallocating space. // kaiden (#850) +15Y
      - as users do more, though, wouldn't the // kitacek (#853) +15Y
        
        Touch every message record every time somebody // kaiden (#855) +15Y
        
        this is why we have to have smart people like // kitacek (#858) +15Y
        
        I'm not necessarily smart, just that // kaiden (#859) +15Y
- (sneezes) What? Who? Oh, look. Got switched // kaiden (#860) +15Y
- - Fun stuff: - Currently ~800 // kaiden (#861) +15Y
  - - Ahaha you lose, Ashley!! // Scrotor (#918) +15Y
    - - i'm confused..... // CapitolZebra (#927) +15Y
      - You read posts in such a fashion that it makes // kaiden (#928) +15Y
        
        oh.... What did I do that made it difficult to // CapitolZebra (#929) +15Y
        
        You view posts "sparsely," that is, // kaiden (#930) +15Y
        
        oh... hmm. Well, I definitely do jump around a // CapitolZebra (#931) +15Y
        
        If you haven't read all of the posts.... // Scrotor (#982) +15Y
- - Also, tag feature isn't working at the // kaiden (#862) +15Y
  - - Alright, I think this is back up. Let me know // kaiden (#863) +15Y
- - when i try to use the ctrl-`-n shortcut, it // kitacek (#883) +15Y
  - - same thing for "myidkfa - next // kitacek (#884) +15Y
    - - Try again. // kaiden (#885) +15Y
- - SPDCA: Being fresh in my mind, I'm // kaiden (#902) +15Y
  - - I think I found the reason why people // kaiden (#1334) +15Y
    - - Sorta funny. I was poking around, and realized // kaiden (#1589) +14Y
- SPDCA: Having trouble sleeping, so // kaiden (#3983) +13Y «
- - would the application of your idkfa algorithm // kitacek (#3986) +13Y
  - - Well... related? Yes and no. Yes, in that // kaiden (#3987) +13Y
- SPDCA: There's an article I happened to // kaiden (#4075) +13Y
- - IDKFA probably makes up the majority of my // MrFood (#4077) +13Y
  - - Haha, thanks man. // kaiden (#4078) +13Y
- - I definitely prefer the way you have idkfa set // norwaygirl (#4092) +13Y
  - - I'm glad you feel that way. Other than the // kaiden (#4098) +13Y *