Research Interests

 

Data Mining, Machine Learning, Combinatorics, Bioinformatics, Predictive Analytics, Business Intelligence, Algorithm, Statistics.

 

 

Dissertation

Outlier Detection by Network Flow [Introduction] [Full 2.4M]

Defense [online presentation] (latest version of IE, allow blocked content.)

 

If you feel my dissertation is too obscure to read, Outliers of Malcolm Galdwell is more enjoyable, but mine is free.

 

 

 

 

Publication

 

1.       Ying Liu, Xin Chen, and Chengcui Zhang. Semantic clustering for region-based image retrieval. Journal of Visual Communication and Image Representation. (accepted)

2.       Ying Liu and Alan P. Sprague. Outlier detection and evaluation by network flow. Int. J. Computer Applications in Technology, Vol. 33, Nos. 2/3, 2008. pp 237-246.

3.       Ying Liu, Xin Chen, Chengcui Zhang, and Alan Sprague. An Interactive Region-Based Image Clustering and Retrieval Platform. IEEE International Conference on Multimedia & Expo (ICME 2006). Jul 9-12, 2006, Toronto, Ontario, Canada.

4.       Ying Liu and Alan P. Sprague. Outlier Detection and Evaluation by Network Flow. The 2004 International Conference on Machine Learning and Applications (ICMLA'04). 16-18 December 2004, Louisville, KY, USA.

5.      Ying Liu, Alan P. Sprague and Elliot Lefkowitz. Network Flow for Outlier Detection. ACM Southeast Conference, 2004.

 

 

 

Source Code

 

(1)   FLOW (Outlier/Cluster Detection and Evaluation by Network Flow.)

 

FLOW is a novel algorithm for outlier/outlier group/cluster identification. Based on Network Flow, we use the Maximum Flow Minimum Cut theorem from Graph theory to detect outliers/clusters. Outliers are evaluated by the volume of the network flow.

 

If the input data file is a cluster, this algorithm can repair poor quality clusters generated by a clustering algorithm. I.e., to solve the problem that points supposed to be separated are in one cluster. If the input data file is the whole data file, this algorithm can first find the connected components by k nearest neighbors, and then outlier detection on each component. If the size of a minimum cut is bigger than the minimum size of a cluster, FLOW will continue to do outlier detection on this minimum cut data.

 

FLOW is efficient at finding outlier groups. For high dimensional data, our algorithm is also efficient at detecting outliers and outlier groups. outliers' feature patterns are usually far different from the dominate cluster features.

 

Download: Beta FLOW_3.01.tar 3.2M

 

Online Presentation (latest version of IE, allow blocked content.)

 

 

(2)   LOF (LOF: Identifying Density-Based Local Outliers.)

Description: LOF is a density-based outlier detection algorithm.

 

Download: LOF.zip (727K)

 

 

(3)   ROCK (A Robust Clustering Algorithm For Categorical Attributes.)

Description: ROCK is a bottom up hierarchical clustering algorithm for categorical attributes.

 

Download: ROCK.tar (1.4M)

 

 

(4)   KD Tree

Description: Kd-tree is a data structure for searching k nearest neighbors.

 

Download: KD_0.01.tar (125K)

 

 

 

 

Home