Kamal Ali's CLL Stanford & ISLE Web page
- Research: Intelligent Agents in the Icarus Cognitive Architecture: action models, transfer learning
- Consulting: getJar, NICTA, TiVo, Yahoo, Catalist, Elder Research, AnswerLab, others.
- Data Mining modeling, Text mining, Recommendations, Search engine relevance measurement, sampling for databases
- Current email: at yahoo.com: kamal3
My current research interests
- Entity extraction, Relationship extraction,
- Sampling in databases
- Search engine evaluation.
- Using parse-tree features for text mining, especially entity extraction.
- Bootstrap learning (knowledge transfer) for learning substitutable types.
- Using drives/motivations to infer goals for intelligent agents.
- Committee Voting from Multiple Parse Trees.
- Question Answering in Web Search
- Blackboard/feedback architecture between lexer, stemmer, synonymizer and parser
- Search Engine Web quality results measurement
- Active learning for search engine improvement
- 2010. Konik T., Ali K., Shapiro D., Li N., Stracuzzi D.
Improving Structural Knowledge Transfer with Parametric Adaptation
In FLAIRS 2010.
- 2009. Ali K., Leung K., Konik T., Choi D. and Shapiro D.
Knowledge-Directed Theory Revision. In ILP 2009. Leuven, Belgium.
- 2009. Li N., Stracuzzi D., Cleveland G., Konik T., Shapiro D., Molineaux M., Aha D. and Ali K.
Constructing Game Agents from Video of Human Behavior
In AIIDE 2009. Stanford, CA.
- 2009. Li N., Stracuzzi D., Cleveland G., Langley P., Konik T., Shapiro D., Ali K., Molineaux M. and Aha D.
Learning Hierarchical Skills for Game Agents from
Video of Human Behavior.
In IJCAI 2009 Workshop on Learning Structural Knowledge from Observation. Pasadena, CA.
- 2007. Ali K. and Scarr M.
Modeling Distribution of Clicks for Web Search.
Accepted as full paper in WWW 2007 (1/7 papers accepted). Banff, Canada.
- 2006. Ali K. and Chang C.
On the relationship between click-rate and relevance for search engines .
DMIE 2006. Prague, Czech Republic.
- 2005. Ali K., Juan Y. and Chang C.
Exploring Cost-Effective Approaches to Human Evaluation of Search Engine Relevance.
ECIR 2005. Santiago de Compostela, Spain.
- 2003. Ali K., Ketchpel S.
Golden Path Analyzer: Using Divide-and-Conquer to Cluster Web Clickstreams.
In KDD 2003. Washington D.C.
1999 - 2001: Hacking at TiVo
- 2004. Ali K., Van Stam, W. TiVo: Making Show Recommendations using a Distributed Collaborative Filtering Architecture, in KDD 2004. Seattle, WA.
1999: Old ISLE Personal Information
- Position: Research Scientist
- Research Area: Applications of data-mining to rooftop detection in satellite images and applications for fault prevention
1999: Military Rooftop detection from Satellite Images
Report on rooftop detection (90K)
including ROC curve results for a Naive Bayes rooftop-detector using
57 features at the verification level with the ROC evaluation being
done at the building level rather than at the rooftop-candidate
level. Several candidates may try to model one building: it is only
necessary for one of them to be correct for the system to "get" the
building. Furthermore, earlier work on ROC curves at the candidate
level ignored buildings for which no candidates were generated giving
over-optimistic results. This research is supported by DARPA.
Paper on rooftop detection at the IU98 Monterey Workshop (256K)
In this paper, we report progress on the use of machine learning to
improve the process of rooftop detection in aerial images. We describe
an existing system for building recognition, BUDDS, and identify its
rooftop stage as a target for improvement. We then review the
naive Bayesian classifier, a simple but robust approach to supervised
induction, and the visual interface we developed to ease the labeling
of training data. We present the results of experiments on the rooftop
detection task that reveal improved recognition levels over the
handcrafted BUDDS classifier, then examine the reliability and
speed of the interactive labeling process itself. Finally, we
consider related research and plans for future work.
1997: Papers on data mining from real-world consulting projects
This represents research work done while I was with the data-mining
consultants group at the IBM Almaden Research Center.
- 1997: KDD 97
paper on "Partial classification using Association Rules" (43K)
Many real-life problems require a partial classification of the data.
We use the term ``partial classification'' to describe the discovery
of models that show characteristics of the data classes, but may not
cover all classes and all examples of any given class.
Complete classification may be infeasible
or undesirable when (1) there are a very large number of class
attributes, (2) most attributes
values are missing, or (3) the class distribution is highly skewed
and the user is interested in understanding the low-frequency class.
In such cases, users often want insights into the data rather
than a complete predictive model for each class. We show how association
rules can be used for partial classification in such domains, and
present two case studies: reducing telecommunications order failures and
detecting redundant medical tests.
< 1996: Committee machines, Bayesian Model Averaging
My basic reasearch interests are in combining classifiers (multiple
models, committee machines and ensembles) to produce more accurate
classifications. I am also interested in methods to prevent
over-fitted models by taking into account the amount of search
required to find the model.
- 1996: PHD Thesis: Learning Probabilistic Relational Concept Descriptions. (583K)
This dissertation presents methods for increasing the accuracy of
probabilistic classification rules learned from noisy, relational data.
It addresses the problem of learning probabilistic rules in noisy, ``real-world''
data sets, the problem of ``small
disjuncts'' in which rules that apply to rare subclasses have high error rates, and
empirical results on multiple models research - especially showing that
the degree of correlatedness of errors of models is negatively correlated
with the amount of error reduction afforded by the multiple models
- 1996: Error Reduction through Learning Multiple Descriptions. (149K; MLJ)
This Machine Learning Journal paper (Vol. 24, No. 3) describes
work on learning and combining classifications made by multiple rule
sets. We show that the model-combining methods considered here are
much more effective at reducing error rates in problems where there
are many irrelevant attributes.
We show however, that on noisy UCI datasets, the multiple models
approach is not able to reduce error as much as it can on the
relatively noise-free but high dimensionality problems. This leads us
to consider the term "noise-limited" datasets. We also show that in
the limit (although not realized in the UCI repository), large numbers
of irrelevant attributes will have the same negative effect on the multiple
models approach as do the noisy datasets.
A comparison of methods for learning and combining evidence from
multiple models . (112K)
This technical report compares three methods for generating multiple models: Stochastic learning,
k-fold partition learning and Bagging.
It also compares four methods for combining evidence from such models to make a
classification: Uniform Voting, Bayesian Combination, Distribution Summation and
We find that k-fold partition learning is inferior to the other methods,
Bagging is better than Stochastic on noisy data sets and Stochastic is better
than Bagging on small data sets.
- 1996: On the Link between Error Correlation and Error Reduction in Decision Tree Ensembles. (122K; UCI Tech. Report (122K)
This technical report shows that there is a linear relationship between the amount of error reduction due to an
ensemble of decision trees and the degree to which the decision trees make errors in an uncorrelated manner.
It also shows that some of the greatest error reductions occur on domains which have many "gain ties".
Finally, it shows that it is better to learn an ensemble that makes errors in a negatively-correlated
manner rather than in an uncorrelated (statistically independent) manner.
- 1995: Classification using Bayes Averaging of Multiple, Relational Rule-based Models. (76K)
This book chapter (in "Learning from Data: Artificial
Intelligence and Statistics, Vol. 5.", Fisher, D. & Lenz, H. (Eds.))
presents a theoretically-sound way of combining classifications from
multiple rule-set models.
A rule-set model learns a rule-set for each class in the data.
A rule-set for a class is a set of rules that all conclude for that class.
The paper shows how to compute the posterior probability of a rule-set model and shows
that using rule-set models in this way leads to higher classification accuracy on relational data base problems.
We also use this approach to learn multiple, recursive concept descriptions.
- 1994: HYDRA-MM: Learning Multiple Descriptions to Improve Classification Accuracy (journal version) (105K;
International Journal on Artificial Intelligence Tools, Vol. 4, 1 & 2.)
This is a general TAI journal paper showing that learning multiple rule-set
models can improve classification accuracy as compared to a single
model learned on the same training data.
There is also a comparison to the multiple rules approach showing that in some circumstances, using
multiple rule-sets is better than using multiple rules.
There is also the version that appeared in the
Proceedings of the Tools for Artificial Intelligence Conference (104K)
Research on probabilistic relational models, noise-tolerance
HYDRA: A Noise-tolerant Relational Learning Algorithm (661K)
This IJCAI 93 paper extends FOIL (a relational learning algorithm) to work on multi-class problems (where there may be
more than two classes in the data).
In addition, we attach a reliability measure to each rule and we learn a rule-set for each class in the data.
Rules may then compete to classify test examples.
We also show that ls-content, a new gain metric, does better than information gain when learning
from noisy data.
- 1992: Reducing
the Small Disjuncts Problem by Learning Probabilistic Concept
This paper (in Petsche, T., Hanson, S.J. and Shavlik, J.
"Computational Learning Theory and Natural Learning Systems",
Vol. 3) shows that attaching reliability estimates to rules decreases the error due to the small
disjuncts problem and that it reduces the overall classification error rate.
1991: Theoretical and empirical research on average-case analysis
Average-case analysis of k-CNF and k-DNF Learning Algorithms
This paper (in Hanson, S.J., Petsche, T., Kearns, M., and Rivest, R.L. (Eds),
"Computational Learning Theory and Natural Learning Systems", Vol. 2) compares the average-case theoretical error-rate of a specific algorithm with the
observed (from Monte-Carlo experiments) error rate of that algorithm as a function of the
number of training examples and the target concept. We show that PAC analysis (which is a
worst-case analysis) yields error estimates that are far too pessimistic and that
average-case analysis gives a good fit to the observed data.
E-mail: at yahoo.com; my userid is kamal3