Skip to content

Commit 5d0f2d4

Browse files
author
burrsettles
committedMar 17, 2015
updated TXT to Markdown after Github migration
1 parent 9640714 commit 5d0f2d4

File tree

3 files changed

+80
-103
lines changed

3 files changed

+80
-103
lines changed
 

‎CHANGELOG.md

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
## DUALIST Changelog ##
2+
3+
### Version 03 - 03/08/2012 ###
4+
5+
* BUG FIX: Updates to dualist.tui.Util and dualist.tui.Test that fix a bug in
6+
testing post-hoc trained models, in the event that the training and test
7+
labels are presented in a different order. (Submitted by Stef Sch)
8+
9+
10+
### Version 02 - 02/10/2012 ###
11+
12+
* BUG FIX: Gracefully processes non-ASCII characters.
13+
14+
* BUG FIX: TwitterPipe no longer ignores text after a @USERLINK.
15+
16+
* BUG FIX: Handles small toy data sets now.
17+
18+
* Separated code into "core" and "gui" components. Core implements the
19+
machine learning business logic, whereas GUI implements the Web-based
20+
interactive interface. As a result, DUALIST now requires a build (using ant)
21+
after any changes to core.
22+
23+
* Created a more user-friendly script "dualist" to run commands.
24+
25+
* Models are now saved in the "models/" directory, indexed by trial name and
26+
timestamp. These models can be evaluated offline on totally separate test
27+
data, or used to classify other large data sets.

‎CHANGELOG.txt

-30
This file was deleted.

‎README.txt ‎README.md

+53-73
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,21 @@
1-
DUALIST: Utility for Active Learning with Instances and Semantic Terms
2-
======================================================================
1+
## DUALIST: Utility for Active Learning with Instances and Semantic Terms ##
32

4-
Burr Settles
5-
Carnegie Mellon University
6-
bsettles@cs.cmu.edu
3+
_Hooray for recursive acronyms!_
74

8-
Version 0.3
9-
March 08, 2012
5+
Version 0.3 / March 08, 2012
106

11-
DUALIST is an interactive machine learning system for building classifiers
12-
quickly. It does so by asking "questions" of the user in the form of both data
13-
instances (e.g., text documents) and features (e.g., words or phrases). It
14-
utilizes active and semi-supervised learning to quickly train a multinomial
15-
naive Bayes classifier for this setting.
7+
DUALIST is an interactive machine learning system for quickly building classifiers for text processing tasks. It does so by asking "questions" of a human "teacher" in the form of both data instances (e.g., text documents) and features (e.g., words or phrases). It uses [active learning](http://www.cs.cmu.edu/~bsettles/pub/settles.activelearning.pdf) and [semi-supervised learning](http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf) to build text-based classifiers at interactive speed.
168

17-
NOTICE: This is currently "research-grade" code. It is provided AS-IS without
18-
any warranties of any kind, expressed or implied, including but not limited to
19-
the implied warranties of merchantability and fitness for a particular purpose
20-
and those arising by statute or otherwise in law or from a course of dealing
21-
or usage of trade. *Whew!*
9+
Research related to DUALIST is described in these publications:
2210

23-
See LICENSE.txt for licensing information.
24-
See CHANGELOG.txt for a history of updates.
11+
* B. Settles. [Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances](http://aclweb.org/anthology/D/D11/D11-1136.pdf). In _Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)_, pages 1467-1478. ACL, 2011. ([addendum](http://www.cs.cmu.edu/~bsettles/pub/settles.emnlp11addendum.pdf))
12+
* B. Settles and X. Zhu. [Behavioral Factors in Interactive Training of Text Classifiers](http://www.cs.cmu.edu/~bsettles/pub/settles.naacl12short.pdf). In _Proceedings of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT)_, pages 563-567. ACL, 2012.
2513

26-
Citation information and technical details:
14+
Watch a [demonstration video](http://vimeo.com/21671958) of DUALIST in action!
2715

28-
B. Settles. Closing the Loop: Fast, Interactive Semi-Supervised Annotation
29-
With Queries on Features and Instances. In Proceedings of the Conference
30-
on Empirical Methods in Natural Language Processing (EMNLP), to appear.
31-
ACL Press, 2011.
16+
----
3217

33-
34-
35-
PURPOSE & GOAL
36-
--------------
18+
### Purpose & Goal ###
3719

3820
The purpose of DUALIST is threefold:
3921

@@ -51,25 +33,26 @@ than the multinomial naive Bayes classifier currently used.
5133
combine multiple "beyond supervised learning" strategies. This ICML workshop
5234
is related: https://sites.google.com/site/comblearn/
5335

36+
See `LICENSE.txt` for licensing information.
37+
See `CHANGELOG.md` for a history of updates.
5438

5539

56-
INTALLATION + RUNNING THE GUI
57-
-----------------------------
40+
### Installation + Running the Web-Based GUI ###
5841

59-
DUALIST requires Java 1.6 and Python 2.5 to work properly. It ships with most
60-
of the dependencies it needs to work, the only exception being the Play! web
42+
DUALIST requires Java 1.6 and Python 2.5 to work properly. It ships with most
43+
of the dependencies it needs to work, the only exception being the Play! web
6144
framework for Java v1.1+, which can be downloaded here:
6245

6346
http://download.playframework.org/releases/play-1.1.zip
6447

65-
Download and install Play! wherever you want on your system (follow the
66-
instructions on their website), and make sure that the "play" command is in
48+
Download and install Play! wherever you want on your system (follow the
49+
instructions on their website), and make sure that the "play" command is in
6750
your $PATH. Once that is done, all you need to do to run DUALIST is:
6851

6952
$ cd <path-to>/dualist
7053
$ dualist gui
7154

72-
This will launch a web server on your machine, which you can access by
55+
This will launch a web server on your machine, which you can access by
7356
pointing your favorite browser to:
7457

7558
http://localhost:8080/
@@ -85,10 +68,9 @@ modern hardware, but may be difficult to use beyond that.
8568

8669

8770

88-
LOGS AND OUTPUT
89-
---------------
71+
### Logs + Output ###
9072

91-
DUALIST writes a log of user actions in the "results/" directory. Trained
73+
DUALIST writes a log of user actions in the "results/" directory. Trained
9274
models are archived as learning progresses in the "models/" directory. Web
9375
server system output is written to "application.log" in the root directory.
9476

@@ -97,9 +79,7 @@ page at any time to get the current model's label predictions, followed by the
9779
set of labeled instances and features/terms (prepended by the '#' character).
9880

9981

100-
101-
USING TRAINED MODELS
102-
--------------------
82+
### Using trained models ###
10383

10484
Trained models are stored in the "models/" directory. There are two utilities
10585
for using these models to apply or evaluate these models on data:
@@ -113,22 +93,20 @@ will then output predictions to STDOUT in a tab-delimted format:
11393
textID label1 prob1 label2 prob2 ... text-summary
11494

11595
The label predictions are output in rank order, thus column #2 corresponds to
116-
the model's most likely prediction, and column #3 is its posterior
117-
probability, and so on. The text summary in the final columns is a snippet of
96+
the model's most likely prediction, and column #3 is its posterior
97+
probability, and so on. The text summary in the final columns is a snippet of
11898
the first 150 characters in the instance.
11999

120100
The other utility, for evaluation, is:
121101

122102
$ dualist test [model] [test-set]
123103

124-
This will produce various statistics about the model and data set, as well as
125-
the model's accuracy compared to a 10-fold cross-validation baseline using the
104+
This will produce various statistics about the model and data set, as well as
105+
the model's accuracy compared to a 10-fold cross-validation baseline using the
126106
same test set.
127107

128108

129-
130-
DATA FILE FORMATS
131-
-----------------
109+
### Data File Formats ###
132110

133111
In either explore or experiment mode, DUALIST accepts data sets as a single
134112
ZIP file. In "explore" mode, data files can be an arbitrary structure within
@@ -166,34 +144,36 @@ each subsequent element is a contextual feature, represented by
166144
shape, affixes, etc.) are induced automatically.
167145

168146

169-
170-
CUSTOMIZATION
171-
-------------
147+
### Customization ###
172148

173149
To create your own data processing pipelines, follow these steps:
174150

175-
1. Familiarize yourself with the "cc.mallet.pipe" package API
151+
1. Familiarize yourself with the `cc.mallet.pipe` package API
176152
(http://mallet.cs.umass.edu/api/)
177-
178-
2. Implement a new pipe in the "dualist.pipes" package of the DUALIST
179-
codebase (use "DocumentPipe.java" as an example).
180-
181-
3. Edit the following files to incorporate the new pipeline into the
182-
web-based user interface:
183-
core/src/dualist/tui/Util.java (the "getPipe" method)
184-
gui/app/views/Applications/experiment.html
185-
gui/app/views/Applications/explore.html
186-
187-
4. Changes made to the "core/" section of the codebase must be manually
188-
compiled by typing the "ant" command. You may need to stop and restart the
189-
GUI in this case.
190-
191-
5. Changes made to the "gui/" section of the codebase are re-compiled on
192-
the fly by the Play! web framework.
193-
194-
6. For more advanced deployment of the web-based GUI, you will probably
195-
need to edit the file "gui/app/conf/application.conf". Refer the the Play!
196-
documentation for more details:
197-
http://www.playframework.org/documentation/1.1/production
153+
154+
2. Implement a new pipe in the `dualist.pipes` package of the DUALIST
155+
codebase (use `DocumentPipe.java` as an example).
156+
157+
3. Edit the following files to incorporate the new pipeline into the
158+
web-based user interface:
159+
* `core/src/dualist/tui/Util.java` (the "getPipe" method)
160+
* `gui/app/views/Applications/experiment.html`
161+
* `gui/app/views/Applications/explore.html`
162+
163+
4. Changes made to the `core/` section of the codebase must be manually
164+
compiled by typing the "ant" command. You may need to stop and restart the
165+
GUI in this case.
166+
167+
5. Changes made to the `gui/` section of the codebase are re-compiled on
168+
the fly by the Play! web framework.
169+
170+
6. For more advanced deployment of the web-based GUI, you will probably
171+
need to edit the file `gui/app/conf/application.conf`. Refer the the Play!
172+
documentation for more details:
173+
http://www.playframework.org/documentation/1.1/production
198174

199175
Good luck, and have fun!
176+
177+
---
178+
179+
This work is supported in part by DARPA (under contract numbers FA8750-08-1-0009 and AF8750-09-C-0179), the National Science Foundation (IIS-0968487), and Google. Any opinions, findings and conclusions or recommendations expressed in this material are the authors' and do not necessarily reflect those of the sponsors.

0 commit comments

Comments
 (0)
Please sign in to comment.