Alex's profileYet Another Personal Spa...PhotosBlogListsMore ![]() | Help |
|
January, 2008 Mailing List Data Mining - The MathEnough games. Let’s get serious for a second and do some math. We’ll start with an illustration that represents the state of a mail list with 500+ contributors over one year time period (20,000+ emails plotted). Technically it’s a function of two parameters F(T,C). The horizontal axis is time (T), when the vertical axis represents contributors (C). In this particular case F(T,C) is MailSent (T,C), where: MailSent (T,C) = 1, when the contributor C sent an email at time T; 0 otherwise. You can think of it as a forest of equally tall trees (height = 1). Ideally we had to plot it as a 3D graph, which would have a bunch of dots on one surface (Z = 1). The picture we have here is the view from the top. We could introduce the third parameter E (email), but for this research we don’t really care which particular email was sent. We are only interested in the fact of sending. In a similar manner we’ll introduce ThreadStarted(T, C), ThreadJoined(T,C) and other functions (the complete list of functions will follow in one of the next chapters). Again, we are not interested in which particular thread was started; we only need to know who started it and when. With a tiny tweak we introduce: TrafficGenerated(T,C) = N, when the contributor C started a thread at T and the thread had grew up to N emails; 0 otherwise (started no threads). In this case, it’s a forest of trees with different height. Each tree is representing one thread and its height depends on the thread length. The tallest tree will show us the largest thread. Using the same logic we will add: Audience(T,C) = M, when the contributor C started a thread at T and M other contributors joined the thread; 0 otherwise (started no threads). For any F(T,C) and a fixed time interval [T1, T2) we could introduce a set of cumulative functions: 1) Contributor's Total: Example: when F is EmailsSent and [T1, T2) = 'year 2007' it would give us the total number of emails sent from the selected contributor in 2007. 2) Mail List Total: Where N is the total number of the mail list contributors. Example: the total number of emails sent to the mail list in 2007. 3) Contributor’s Share:
Example: 5% of all emails in 2007 were sent from the contributor C. 3) Contributor’s Rate For any Ti within our fixed time interval [T1, T2) all contributors can be divided in two groups: those who joined the mail list before Ti (“veterans”) and those who started after Ti (“rookies”). Let’s define Ti as time when the contributor i sent the first email within our interval (T1 ≤ Ti < T2). Then: Example: in 2007 the contributor C was sending 3.5 emails per day on average. If the time delta is measured in days it’s a daily rate function, if we measure it in hours – it’s an hourly rate function, etc. TrackbacksThe trackback URL for this entry is: http://spaceincase.spaces.live.com/blog/cns!712869991EC55B40!6672.trak Weblogs that reference this entry
|
|
|