updated TXT to Markdown after Github migration

burrsettles · burrsettles · commit 5d0f2d46ab25 · 2015-03-17T14:49:13.000-04:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,27 @@
+## DUALIST Changelog ##
+
+### Version 03 - 03/08/2012 ###
+
+ * BUG FIX: Updates to dualist.tui.Util and dualist.tui.Test that fix a bug in
+   testing post-hoc trained models, in the event that the training and test
+   labels are presented in a different order. (Submitted by Stef Sch)
+
+
+### Version 02 - 02/10/2012 ###
+
+ * BUG FIX: Gracefully processes non-ASCII characters.
+
+ * BUG FIX: TwitterPipe no longer ignores text after a @USERLINK.
+
+ * BUG FIX: Handles small toy data sets now.
+
+ * Separated code into "core" and "gui" components. Core implements the
+   machine learning business logic, whereas GUI implements the Web-based
+   interactive interface. As a result, DUALIST now requires a build (using ant)
+   after any changes to core.
+
+ * Created a more user-friendly script "dualist" to run commands.
+
+ * Models are now saved in the "models/" directory, indexed by trial name and
+   timestamp. These models can be evaluated offline on totally separate test
+   data, or used to classify other large data sets.
diff --git a/CHANGELOG.txt b/CHANGELOG.txt
diff --git a/README.md b/README.md
@@ -1,39 +1,21 @@
-DUALIST: Utility for Active Learning with Instances and Semantic Terms
-======================================================================
+## DUALIST: Utility for Active Learning with Instances and Semantic Terms ##
 
-Burr Settles
-Carnegie Mellon University
-bsettles@cs.cmu.edu
+_Hooray for recursive acronyms!_
 
-Version 0.3
-March 08, 2012
+Version 0.3 / March 08, 2012
 
-DUALIST is an interactive machine learning system for building classifiers
-quickly. It does so by asking "questions" of the user in the form of both data
-instances (e.g., text documents) and features (e.g., words or phrases). It
-utilizes active and semi-supervised learning to quickly train a multinomial
-naive Bayes classifier for this setting.
+DUALIST is an interactive machine learning system for quickly building classifiers for text processing tasks. It does so by asking "questions" of a human "teacher" in the form of both data instances (e.g., text documents) and features (e.g., words or phrases). It uses [active learning](http://www.cs.cmu.edu/~bsettles/pub/settles.activelearning.pdf) and [semi-supervised learning](http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf) to build text-based classifiers at interactive speed.
 
-NOTICE: This is currently "research-grade" code. It is provided AS-IS without
-any warranties of any kind, expressed or implied, including but not limited to
-the implied warranties of merchantability and fitness for a particular purpose
-and those arising by statute or otherwise in law or from a course of dealing
-or usage of trade. *Whew!*
+Research related to DUALIST is described in these publications:
 
-See LICENSE.txt for licensing information.
-See CHANGELOG.txt for a history of updates.
+  * B. Settles. [Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances](http://aclweb.org/anthology/D/D11/D11-1136.pdf). In _Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)_, pages 1467-1478. ACL, 2011. ([addendum](http://www.cs.cmu.edu/~bsettles/pub/settles.emnlp11addendum.pdf))
+  * B. Settles and X. Zhu. [Behavioral Factors in Interactive Training of Text Classifiers](http://www.cs.cmu.edu/~bsettles/pub/settles.naacl12short.pdf). In _Proceedings of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT)_, pages 563-567. ACL, 2012.
 
-Citation information and technical details:
+Watch a [demonstration video](http://vimeo.com/21671958) of DUALIST in action!
 
-    B. Settles. Closing the Loop: Fast, Interactive Semi-Supervised Annotation 
-    With Queries on Features and Instances. In Proceedings of the Conference 
-    on Empirical Methods in Natural Language Processing (EMNLP), to appear. 
-    ACL Press, 2011.
+----
 
-
-
-PURPOSE & GOAL
---------------
+### Purpose & Goal ###
 
 The purpose of DUALIST is threefold:
 
@@ -51,25 +33,26 @@ than the multinomial naive Bayes classifier currently used.
 combine multiple "beyond supervised learning" strategies. This ICML workshop
 is related: https://sites.google.com/site/comblearn/
 
+See `LICENSE.txt` for licensing information.
+See `CHANGELOG.md` for a history of updates.
 
 
-INTALLATION + RUNNING THE GUI
------------------------------
+### Installation + Running the Web-Based GUI ###
 
-DUALIST requires Java 1.6 and Python 2.5 to work properly. It ships with most 
-of the dependencies it needs to work, the only exception being the Play! web 
+DUALIST requires Java 1.6 and Python 2.5 to work properly. It ships with most
+of the dependencies it needs to work, the only exception being the Play! web
 framework for Java v1.1+, which can be downloaded here:
 
     http://download.playframework.org/releases/play-1.1.zip
 
-Download and install Play! wherever you want on your system (follow the 
-instructions on their website), and make sure that the "play" command is in 
+Download and install Play! wherever you want on your system (follow the
+instructions on their website), and make sure that the "play" command is in
 your $PATH. Once that is done, all you need to do to run DUALIST is:
 
     $ cd <path-to>/dualist
     $ dualist gui
 
-This will launch a web server on your machine, which you can access by 
+This will launch a web server on your machine, which you can access by
 pointing your favorite browser to:
 
     http://localhost:8080/
@@ -85,10 +68,9 @@ modern hardware, but may be difficult to use beyond that.
 
 
 
-LOGS AND OUTPUT
----------------
+### Logs + Output ###
 
-DUALIST writes a log of user actions in the "results/" directory. Trained 
+DUALIST writes a log of user actions in the "results/" directory. Trained
 models are archived as learning progresses in the "models/" directory. Web
 server system output is written to "application.log" in the root directory.
 
@@ -97,9 +79,7 @@ page at any time to get the current model's label predictions, followed by the
 set of labeled instances and features/terms (prepended by the '#' character).
 
 
-
-USING TRAINED MODELS
---------------------
+### Using trained models ###
 
 Trained models are stored in the "models/" directory. There are two utilities
 for using these models to apply or evaluate these models on data:
@@ -113,22 +93,20 @@ will then output predictions to STDOUT in a tab-delimted format:
     textID  label1  prob1   label2  prob2   ... text-summary
 
 The label predictions are output in rank order, thus column #2 corresponds to
-the model's most likely prediction, and column #3 is its posterior 
-probability, and so on. The text summary in the final columns is a snippet of 
+the model's most likely prediction, and column #3 is its posterior
+probability, and so on. The text summary in the final columns is a snippet of
 the first 150 characters in the instance.
 
 The other utility, for evaluation, is:
 
     $ dualist test [model] [test-set]
 
-This will produce various statistics about the model and data set, as well as 
-the model's accuracy compared to a 10-fold cross-validation baseline using the 
+This will produce various statistics about the model and data set, as well as
+the model's accuracy compared to a 10-fold cross-validation baseline using the
 same test set.
 
 
-
-DATA FILE FORMATS
------------------
+### Data File Formats ###
 
 In either explore or experiment mode, DUALIST accepts data sets as a single
 ZIP file. In "explore" mode, data files can be an arbitrary structure within
@@ -166,34 +144,36 @@ each subsequent element is a contextual feature, represented by
 shape, affixes, etc.) are induced automatically.
 
 
-
-CUSTOMIZATION
--------------
+### Customization ###
 
 To create your own data processing pipelines, follow these steps:
 
-    1. Familiarize yourself with the "cc.mallet.pipe" package API
+ 1. Familiarize yourself with the `cc.mallet.pipe` package API
     (http://mallet.cs.umass.edu/api/)
-    
-    2. Implement a new pipe in the "dualist.pipes" package of the DUALIST
-    codebase (use "DocumentPipe.java" as an example).
-    
-    3. Edit the following files to incorporate the new pipeline into the 
-    web-based user interface:
-        core/src/dualist/tui/Util.java (the "getPipe" method)
-        gui/app/views/Applications/experiment.html
-        gui/app/views/Applications/explore.html
-    
-    4. Changes made to the "core/" section of the codebase must be manually 
-    compiled by typing the "ant" command. You may need to stop and restart the 
-    GUI in this case.
-
-    5. Changes made to the "gui/" section of the codebase are re-compiled on 
-    the fly by the Play! web framework.
-    
-    6. For more advanced deployment of the web-based GUI, you will probably 
-    need to edit the file "gui/app/conf/application.conf". Refer the the Play! 
-    documentation for more details: 
-    http://www.playframework.org/documentation/1.1/production
+
+ 2. Implement a new pipe in the `dualist.pipes` package of the DUALIST
+    codebase (use `DocumentPipe.java` as an example).
+
+ 3. Edit the following files to incorporate the new pipeline into the
+ web-based user interface:
+  * `core/src/dualist/tui/Util.java` (the "getPipe" method)
+  * `gui/app/views/Applications/experiment.html`
+  * `gui/app/views/Applications/explore.html`
+
+ 4. Changes made to the `core/` section of the codebase must be manually
+ compiled by typing the "ant" command. You may need to stop and restart the
+ GUI in this case.
+
+ 5. Changes made to the `gui/` section of the codebase are re-compiled on
+ the fly by the Play! web framework.
+
+ 6. For more advanced deployment of the web-based GUI, you will probably
+ need to edit the file `gui/app/conf/application.conf`. Refer the the Play!
+ documentation for more details:
+ http://www.playframework.org/documentation/1.1/production
 
 Good luck, and have fun!
+
+---
+
+This work is supported in part by DARPA (under contract numbers FA8750-08-1-0009 and AF8750-09-C-0179), the National Science Foundation (IIS-0968487), and Google. Any opinions, findings and conclusions or recommendations expressed in this material are the authors' and do not necessarily reflect those of the sponsors.