By Michael M. Richter (auth.), Petra Perner (eds.)

ICDM / MLDM Medaillie (limited version) Meissner Porcellan, the “White Gold” of King August the most powerful of Saxonia ICDM 2007 used to be the 7th occasion within the business convention on info Mining sequence and was once held in Leipzig ( For this version this system Committee got ninety six submissions from 24 nations (see Fig. 1). After the peer-review strategy, we approved 25 top quality papers for oral presentation which are integrated during this complaints e-book. the themes variety from points of category and prediction, clustering, internet mining, info mining in medication, functions of information mining, time sequence and widespread trend mining, and organization rule mining. Germany 9,30% 4,17% China 9,30% 1,04% 6,98% 3,13% South Korea Czech Republic 6,98% 3,13% united states 6,98% 2,08% 4,65% 2,08% united kingdom Portugal 4,65% 2,08% Iran 4,65% 2,08% India 4,65% 2,08% Brazil 4,65% 1,04% Hungary 4,65% 1,04% Mexico 4,65% 1,04% Finland 2,33% 1,04% eire 2,33% 1,04% Slovenia 2,33% 1,04% France 2,33% 1,04% Israel 2,33% 1,04% Spain 2,33% 1,04% Greece 2,33% 1,04% Italy 2,33% 1,04% Sweden 2,33% 1,04% Netherlands 2,33% 1,04% Malaysia 2,33% 1,04% Turkey 2,33% 1,04% Fig. 1. Distribution of papers between nations Twelve papers have been chosen for poster displays which are released within the ICDM Poster complaints Volume.

D . Numerical features Fi with domain Di = R are first normalized by the mapping fi → fi /(max − min), where max and min denote, respectively, the largest and smallest value for Fi observed so far; these values are permanently updated. Then, δi (fi , fi ) is defined by the distance between the normalized values of fi and fi . For a discrete attribute Fj , the distance between two values fj and fj is defined by the following measure: m P (λk | Fj = fj ) − P (λk | Fj = fj ) , δi (fj , fj ) = k=1 where m is the number of classes and P (λ | F = f ) is the probability of the class λ given the value f for attribute F .

Thus, it is a kind of real accuracy that refers to a certain section of the stream. (ii) The absolute classification rate aims at estimating the accuracy at a particular moment of time. To this end, 1,000 extra test instances are generated at random according to a uniform distribution, and the classification accuracy for this test set is derived by using the current case base; this is done for every 10 time points. We conducted experiments with 8 different data streams (see table 1), some of which have already been used in the literature before: The streams Gauss, Sine2, Stagger and Mixed were used in [14], and the Hyperplane data (that we generated for the dimensions d = 2 and d = 5) was used in multiple experiments for data streams in [28].

5 MML-Based Learning of the Statistical Models The Minimum Message Length Principle (MML) [21] can be used as a learning approach for updating the existing statistical model, as well as for learning new statistical models, when enough data are available within the temporary collection. From an information-theory point of view, the minimum message length approach is based on evaluating statistical models according to their ability to compress a message containing the data. High compression is obtained by forming good models of the data to be coded.

