Research Interests
Data Mining, Machine Learning, Combinatorics, Bioinformatics, Predictive Analytics, Business Intelligence, Algorithm, Statistics.
Dissertation
Outlier Detection by
Network Flow [Introduction] [Full 2.4M]
Defense [online presentation] (latest version of IE, allow blocked content.)
If you feel
my dissertation is too obscure to read, Outliers
of Malcolm Galdwell is more enjoyable, but mine is free.
Publication
1.
Ying Liu, Xin Chen, and Chengcui Zhang.
Semantic clustering for region-based image retrieval. Journal of Visual
Communication and Image Representation. (accepted)
2.
Ying Liu and Alan P. Sprague. Outlier detection
and evaluation by network flow. Int. J. Computer Applications in Technology,
Vol. 33, Nos. 2/3, 2008. pp 237-246.
3.
Ying Liu, Xin Chen, Chengcui Zhang, and Alan
Sprague. An Interactive Region-Based Image Clustering and Retrieval Platform.
IEEE International Conference on Multimedia & Expo (ICME 2006). Jul 9-12,
2006, Toronto, Ontario, Canada.
4.
Ying Liu and Alan P. Sprague. Outlier
Detection and Evaluation by Network Flow. The 2004 International Conference on
Machine Learning and Applications (ICMLA'04). 16-18 December 2004, Louisville,
KY, USA.
5. Ying
Liu, Alan P. Sprague and Elliot Lefkowitz. Network Flow for Outlier Detection.
ACM Southeast Conference, 2004.
Source Code
(1) FLOW (Outlier/Cluster Detection and Evaluation by Network Flow.)
FLOW is a novel algorithm for
outlier/outlier group/cluster identification. Based on Network Flow, we use the
Maximum Flow Minimum Cut theorem from Graph theory to detect outliers/clusters.
Outliers are evaluated by the volume of the network flow.
If the input data file is a cluster, this
algorithm can repair poor quality clusters generated by a clustering algorithm.
I.e., to solve the problem that points supposed to be separated are in one
cluster. If the input data file is the whole data file, this algorithm can
first find the connected components by k nearest neighbors, and then outlier
detection on each component. If the size of a minimum cut is bigger than the
minimum size of a cluster, FLOW will continue to do outlier detection on this
minimum cut data.
FLOW is efficient at finding outlier
groups. For high dimensional data, our algorithm is also efficient at detecting
outliers and outlier groups. outliers' feature patterns are usually far
different from the dominate cluster features.
Download: Beta FLOW_3.01.tar (3.2M)
Online
Presentation (latest
version of IE, allow blocked content.)
(2) LOF (LOF: Identifying Density-Based Local Outliers.)
Description: LOF is a density-based outlier
detection algorithm.
Download:
LOF.zip (727K)
(3) ROCK (A Robust Clustering Algorithm For Categorical Attributes.)
Description: ROCK is a bottom up hierarchical
clustering algorithm for categorical attributes.
Download:
ROCK.tar (1.4M)
(4) KD Tree
Description: Kd-tree is a data structure for
searching k nearest neighbors.
Download: KD_0.01.tar (125K)