VERSION: 2005/7/5 Compliance and Conformity, Exception and Anomaly, in Science and Engineering An iAstro (COST Action 283) Workshop, RHUL, Friday 8 and Saturday 9 July 2005. (iAstro address: www.iAstro.org) Times: Friday, 8 July, 09:00-17:30; Saturday, 9 July, 09:30-15:00. (Note: the Royal Holloway, University of London, campus is in Egham, west of London. This meeting however will be at RHUL offices in the centre of London.) Venue: Boardroom, RHUL, 2 Gower Street, London WC1E 6DP. Reception desk just around the corner at 11 Bedford Square. You should sign in at the reception first, to obtain the door code for 2 Gower Street. [See map: RHULGowerSt.doc] Themes of the workshop include: 1. Target and faint feature detection, from imagery and other signals. 2. Abnormal tissue in medical imaging. 3. Two-class classification problems with very imbalanced priors. 4. One-class classification, to determine conformity and anomaly. 5. Novelty detection, e.g. in machine condition monitoring. 6. Serendipitous knowledge discovery in data mining. 7. Attention. Applications: - Astronomy - Medical imaging - Forensic imaging - Video compression - Vision in sport - Vision in transport - Vision in surveillance - One-class problems in industrial machine vision Proceedings: web-based after the event. PROGRAMME FRIDAY MORNING 09:30-10:00 1. Rafael Molina (Granada), "Bayesian signal restoration and super-resolution in imaging and video" 10:00-10:30 2. Jean-Luc Starck (CEA, Saclay), "Morphological Component Analysis and its Applications" 10:30-10:45 Coffee/tea 10:45-11:15 3. Albert Bijaoui, Ch.Benoist, A.Guennec & E.Slezak (OCA, Nice), "Analysis of a set of multiband images in the framework of a Virtual Observatory" 11:15-11:45 4. Fionn Murtagh (RHUL), "Model-based segmentation of 2D and 3D multiband images" 11:45-12:00 5. Discussion: Whither Image and Signal Processing? FRIDAY AFTERNOON 14:00-14:30 6. Volodya Vovk (RHUL), "Transduction learning, conformal prediction" 14:30-15:00 7. Boris Mirkin, CS, Birkbeck Univ London, "A one-class clustering analogue to Principal Component Analysis". 15:00-15:15 Coffee/tea 15:15-15:45 8. Antoine Mahul (ISIMA, Clermont-Ferrand), traffic engineering with neural nets. 15:45-16:15 9. Andrew Moore, (data mining of telecoms traffic data), Computer Laboratory, Univ of Cambridge and Intel Research. 16:15-16:45 10. David Dowe (Monash University), MML, Continuous and Discrete Variables, Classification, Clustering and Generalised Bayes Nets 16:45-17:00 11. Discussion: Whither Signal Modelling and Learning? 18:30- Dinner, Konaki Greek Restaurant (5 Coptic Street, see http://www.viewlondon.co.uk/info_restaurant_3085.html for details and a map), very close to the workshop venue. SATURDAY MORNING 09:30-09:50 12. Alex Aussem, (Univ Lyon), "Mining correlated patterns and causal relationships for the indentification of Bayesian network structures" 09:50-10:10 13. Vito Di Gesu', Giosue' Lo Bosco, Jerome H.Friedman, (Palermo, Stanford), "New measure of similarities for mining rules" 10:10-10:25 Coffee/tea 10:25-10:55 14. Hui Wang (Univ Ulster), "A new similarity measure for relational data" 10:55-11:25 15. Roy Davies, Physics, RHUL, "Use of sampling and attention-based processing for speeding up image sequence analysis in a transport setting". 11:25-11:55 15. Vito Di Gesu', Bertrand Zavidovique (Palermo, Paris Sud), "On the measures of symmetries". 11:55-12:15 16. Balance Sheet: Whither Data Mining? POSTERS S. O'Tuairisg and A. Shearer (NUI Galway), "Enhancing imaging signatures of supernova remnants using point-spread function matching techniques" SATURDAY AFTERNOON 14:00-15:00 Management Committee meeting (all welcome) Agenda 1. Report on budget, and events up to November 2005. (F. Murtagh) 2. Report on School, Capri, September 2005. (G. Longo) 3. Report on Annual TIST meeting, June-July 2005. (F. Murtagh) 4. Update on plans for a future COST Action. (F. Murtagh) 5. Discussion of other future plans (Marie Curie RTN?) and events (Astrostatistics/SCMA/SAMSI). 6. Proposal for web services in pattern recognition: from Tin Kam Ho (Bell/Lucent, chair, IAPR TC13) - "I have been talking with Chris Miller [vice chair, NOAO] and he has some channels to set up web services to post pattern recognition algorithms. We will work on it more and will discuss with the group with it is ready." 7. COST evaluation of iAstro. 8. AOB Accommodations There are many possibilities in central London. One possibility is the Ibis London Euston, near Euston Station, a short walk from the iAstro event venue, and costs (according to the web) GBP 79.95 for a weekday night, and GBP 74.95 for a weekend night. http://www.ibishotel.com/ibis/fichehotel/gb/ibi/0921/fiche_hotel.shtml End. Abstracts Title: A new similarity measure for relational data Hui Wang, University of Ulster Abstract: Most similarity measures work for relational data and the ideas behind them do not generalise to other types of data. For relational data they are usually restricted to either categorical or numerical attributes, and some are confined to classification tasks. In this talk I will present a similarity measure that works for relational data, is task-independent and applies to both categorical and numerical data in a conceptually uniform way. This similarity measure is derived rigorously from a probability function and corresponds to the intuition that if we consider a system of neighbourhoods around a data point, the data points closer to this point should be included in more of these neighbourhoods than more distant points. Experiments on a large number of public datasets show that it fares very competitively with commonly used similarity measures, and its average performance is consistently the best. Similarity measures for grouping, classification, and mining Vito Di Gesu', Jerome H. Friedman, Giosue' Lo Bosco DMA - Univ.Palermo, Italy DS - Stanford University, USA Variability and noise in data-sets make hard the discover of important regularities among association rules in mining problems. The need exists for defining flexible and robust similarity measures. This paper considers a new class of similarity measures, SM's, to perform grouping of association rules by standard clustering techniques. Moreover, SM's definitions is extended to work on non-numeric or symbolic feature spaces. The proposed SM's are evaluated on simulated data-sets for the grouping association rules. An application on real data is also considered in order to test their performance in recognizing users of a UNIX system. On the measures of symmetries Vito Di Gesu', Bertrand Zavidovique DMA - Univ.Palermo, Italy IEF - Univ. Paris-Sud, France Several algorithms have been considered to compute local and global object symmetries. In the case of gray level digital images, it is of great interest to assign a degree of symmetry for each direction in the digital retina. This information can be used to define new features for classification problem and to find directionality in the feature space. In computer vision, the evaluation of local symmetries is a powerful tool to determine ROI (Regions Of Interest). This paper will introduce new measures of symmetry, their properties will be discussed and some experiments will be done on artificial and real data. Fixation-based processing for vehicle guidance Professor Roy Davies Machine Vision Group, Department of Physics Royal Holloway, University of London Egham, Surrey, TW20 0EX In setting up real-time inspection applications, it is necessary to obtain as much speedup of the basic algorithm as possible, in order to save on hardware costs. Image sampling has been found useful for this purpose. Moving to the quite different vehicle guidance scenario, much the same applies, but the most appropriate way to sample the images is less clear. This paper discusses some basic work on this task, which can be construed as applying attention to, and fixating on, multiple locations in the images. In straightforward cases, only a few percent of the image pixels are accessed, thereby greatly reducing execution time. Some intriguing analogies with the human visual system arise, but these must not be taken too far as there is no reason why machine vision should evolve in exactly the same way as human vision. Analysis of a set of multiband images in the framework of a Virtual Observatory Albert Bijaoui, Ch.Benoist, A.Guennec & E.Slezak Laboratoire Cassiopée UMR 6202, Observatoire de la Côte d^ÒAzur, BP 4229, 06304 Nice Cedex 4 (France) In the framework of a Virtual Observatory a same sky region may be observed with a high number of spectral bands. Even for few bands, it is not easy to match the objects identified in each image and to get consistent measurements. An approach based on a fusion image is proposed. Different algorithms were tested for building such an image. The best results are obtained from the one which takes into account a deconvolution with a wavelet approach. Multiscale algorithms allow us to increase the detection rate compared to classical ones based on a pixel thresholding. The measurements of all the objects are then done from the radial profiles. This allows us to get consistent measurements from images obtained with different filters. The main difficulty lies in the photometrical classification. The number of significant classes increases very rapidly with the number of bands. Principal component analysis, based on photometrical correlations, does not allow us to reduce the number of classes by an important factor. Instead to search to identify all the photometrical classes, it is possible to describe the spectral distribution as a superimposition of different pure element ones. This mixing can be due either to a true object superimposition or to the description of an intermediate class as the interpolation between different pure elements. The proposed method starts from a classification built from photometrical ratios. The source separation is then done by a CLEAN-like algorithm. David Dowe (Monash University) MML, Continuous and Discrete Variables, Classification, Clustering and Generalised Bayes Nets We first provide an overview of Minimum Message Length (MML) (Wallace and Boulton, Comp. J., 1968; Wallace and Freeman J Royal Stat Soc, 1987; Wallace and Dowe, 1999a; Wallace (posthumous) 2005, ISBN: 0-387-23795-X). MML gives invariant Bayesian point estimation (Wallace and Boulton, 1975) by constructing a two-part message encoding the theory (or hypothesis) followed by the data encoded in terms of the hypothesis. The fact that all the recorded data will be of finite accuracy and that we are willing to truncate our parameter estimates to within an uncertainty (typically proportional to the reciprocal of the square root of the expected Fisher information) enables us typically to gain substantial compression. MML can be thought of as posterior maximisation done properly. Armed with an invariant Bayesian way of coding both (appropriately truncated) continuous and discrete parameters, we then briefly survey some applications of MML to problems in statistics and machine learning. These include clustering (mixture modelling), classification (decision trees and graphs), time series autoregression and other regression, and generalised Bayes nets (with continuous and discrete attributes). Mention is made of the author's question (1998) as to whether only Strict MML and closely related Bayesian methods can in general guarantee both statistical invariance and statistical consistency; and we mention an application of a preliminary scheme for MML mixture modelling with single factor analysis (Edwards and Dowe, 1998) to a problem in Infrared Astronomical Satellite (IRAS) Low Resolution Spectroscopy (LRS) data.