Contents........................................................................................................................................ 1
Introduction................................................................................................................................... 2
Information retrieval
and NLP....................................................................................................... 2
The program.................................................................................................................................. 3
High-level Overview........................................................................................................................ 3
User Interface............................................................................................................................... 3
Lexical analysis............................................................................................................................ 3
Strip Tags (HTML)......................................................................................................................... 4
Morphological analysis.................................................................................................................. 4
Simplification of
sentences by removal of stop words....................................................................... 5
Computation of uni-,
bi- and trigrams.............................................................................................. 5
Generation of likely
word sequences to summarize document contents............................................. 5
Experimental procedure
and Results............................................................................................ 6
Shortcomings................................................................................................................................ 9
End of sentences not
recognized................................................................................................... 9
N-grams....................................................................................................................................... 9
Data structure, term vs
list............................................................................................................. 9
Generate most likely
sentences................................................................................................... 10
Conclusion................................................................................................................................... 10
References................................................................................................................................... 11
Appendix A: Program
listing........................................................................................................ 13
w.pl........................................................................................................................................ 13
sr.pl....................................................................................................................................... 25
Appendix B: Selected
results....................................................................................................... 29
Appendix C: Text file
for analysis Emdr.txt.................................................................................. 39
Appendix D: Trace
Printouts........................................................................................................ 53
removehtmltokens................................................................................................................... 53
cleardata................................................................................................................................ 53
flatenlist.................................................................................................................................. 53
Unigram_test.txt
contains:....................................................................................................... 54
Unigram_test_out.txt
contains:................................................................................................. 54
Trace on running this
sample.................................................................................................... 55
One aspect of Artificial Intelligence concerns natural language processing (NLP). The programming language, Prolog, has frequently been used for NLP due to its powerful capacity of pattern matching and unification. Limited forms of NLP are used to analyse documents for information retrieval purposes, for example, to find a set of documents on the World Wide Web that satisfies a search query. The aims of this project are to:
(1) understand the basics of document analysis (text processing) for information retrieval,
(2) learn more about NLP, and
(3) write a program in Prolog to perform a few document analysis experiments on a text document.
I will be achieving these aims by firstly giving a brief discussion of the salient theoretical issues related to information retrieval and NLP. I will be looking at the program and its functional areas in some detail. Thereafter, I will discuss example input and the results generated, followed by a critical analysis of this project.
Information retrieval is the task of finding documents that are relevant to a user’s query (Russell & Norvig, 2003, pp 840). This process requires that some form of content representation must be used to relate the query to the contents of the documents. Documents can be modelled in a number of different ways, each varying in its complexity and flexibility. In this project I restrict the document analysis task to the utilisation of three types of probabilistic language models, namely the construction of unigram, bigram and trigram models of a document.
A unigram model (Russell & Norvig, 2003, p 835) assigns a probability P(w) to each word in the document, where the probability is estimated by determining the frequency of occurrence of each word in the document. The bigram model assigns a probability P(wi+1|wi) to each sequence of words (wiwi+1) given the occurrence of the previous word wi. The trigram model similarly assigns a probability based on the previous two words. In this way, in general, n-gram document models can be constructed for sequences of length n.
Before one can count words in a document to determine any n-gram model, however, the document must be pre-processed in order to locate words, and to separate the relevant words from other symbols, such as punctuation, or any other special notation used in the document. This is typically called lexical analysis, and precedes NLP proper. In general, NLP divides up into five levels (Covington, 1994, p.5-6): Phonology (sound), morphology (word formation), syntax (sentence structure), semantics (meaning) and pragmatics (use of language in context). Sound can be ignored in this project since we deal with text information only and not speech. For information retrieval, one can restrict oneself to morphology only, since the statistical nature of these n-gram models attempts to model the syntax (word order), semantics and pragmatics of the information contained in the document text.
Another simplification of document text that can also be done is the removal of words that one believes contribute little to the overall document model (Covington, 1994, p.17). Russell and Norvig (p.846) call them “stop words.” Examples of these include conjunctions, prepositions, and other fillers. Sometimes the document medium can also contribute to the unnecessary inclusion of otherwise non-useful words in the text. In this project html tags would be an example of such medium related “stop words.”
After this short overview of the processing stages for document analysis, the main program can be described. References to programs and related literature by other authors are discussed in the relevant subsections below.
The main purpose of this document analysis program is to count the number of useful words in the document in various ways, and to use NLP to (hopefully) improve the analysis.
The document analysis program consists of the following processing stages.
1. Naming the input document file and output file for results
2. Specifying parameters for processing the text
3. Lexical analysis
4. Morphological analysis
5. Simplification of sentences by removal of stop words
6. Computation of uni-, bi- and trigrams
7. Generation of likely word sequences to summarize the document contents
8. Saving the word frequencies
The output information generated from processing a particular document can then be inspected to see how well this automatic analysis corresponds to the semantics of the document. The major stages (3-7) are discussed in the subsections below.
The analytical capabilities of these NLP techniques, which form the focus of this project are processes that would normally be implemented as background processes within a wider query system. As a result development of the user interface has been deemed outside of the scope of this project.
The user is required to run the application by typing the following run command at the query prompt:
run (<<morph-analysis>>,
<<simplify-analysis>>, <<html-file>>,
<<Input-Filename>>, <<Output-Filename>>).
Where:
<<morph-analysis>> is the either “yes” or “no” depending on whether the application is expected to apply morphological analysis to the document(s) as part of the processing.
<<simplify-analysis>> is the either “yes” or “no” depending on whether the application is expected to apply simplification to the document(s) as part of the processing.
<< html-file>> is the either “yes” or “no” depending on whether the input file is an HTML file, which means that the HTML tags must be removed before analysis.
<<Input-Filename>> is the name of the document that the user wishes to process using the application
<<Output-Filename>> is the name of the text file to generate containing the output and analysis of the document.
(Note: Many documents can be processed using this procedure. Each time another document is processed the uni-, bi-, and trigram counters are incremented. Thus the final Output file will contain the final analyses of all the documents processed.
However I have included the predicate cleardata which can be run to clear all the previously calculated and stored totals so that the documentation analysis can start afresh. This predicate takes no input parameters and is simply called as follows:
cleardata.
The lexical analysis in this project is accomplished through the implementation of tokenization in Et.pl as written by Covington. Tokenization is the processes of breaking up a text file into words and/or other significant units (Covington, 2003, pp 1). For example the sentence “Mike’s dog buried the £100 000!” would be broken down into a series of tokens:
[ w([m,i,k,e]), w([s]), w([d,o,g]), w([b,u,r,i,e,d]), w([t,h,e]), s(‘£’), n([1,0,0,0,0,0]), s(‘!’)].
Here each unit is categorised according to whether it is a word w(…) or a special character s(…) or a number n(…). All the capitals are reduced to smalls and all the white space, which include all blanks, line breaks and in this case the apostrophe, are removed.
A further step to the lexical analysis is the removal of all numbers and special characters under the assumption that they are also irrelevant in core semantic meaning of the text and can therefore safely be ignored. However, before that is done, we need to take care of the document mark-up tags if the input file is an HTML file. This is discussed in the next section.
This gives us the output after removal of special characters, numbers and the predicates used to mark the different kinds of symbols:
[ mike, s, dog, buried, the ]
Documents in HTML format include special sequences of characters to mark-up the content for proper display and to encode various kinds of information about the document. For example,
<html>
<head>
<title>some string</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
tags encode information about the document which is not part of its content, and should therefore not be tokenized or used for text analysis. The procedure removehtmltokens removes all symbols starting with a ‘<’ up to and including the ‘>’ which marks the end of an HTML tag. Sequences of different lengths can occur in-between, but the most frequently occurring mark-up tags have three or four symbols, e.g. <title> and </title>. After tokenization they will be represented as
[ …, s(<),w(t,i,t,l,e),s(>),…] and [ …, s(<), s(‘/’),w(t,i,t,l,e),s(>),…]
and can be removed by matching these sequences at the beginning (head) of the list of tokens. To match any word, indicated by w(…) in such a sequence, the anonymous variable is used.
The next part of this procedure simply removes all symbols after recognizing ‘<’ until ‘>’ is found in the list of tokens.
In an ideal world the morphological analysis would reduce each compound word into its pure form. For example words like ‘testing’, ‘days’, ‘did’ and ‘biggest’ need to be reduced to ‘test’, ‘day’, ‘do’ and ‘big’. However, unfortunately the Schlachter version of a morphological analyser (pronto_morph_engine.pl) used to supplement this experiment has its limitations. This analyser is based on a context-free grammar and thus needs to execute in a deterministic and non deterministic manner. For words that are found in the irregular word list, and thus only have one determined morpheme the correct results are given. If however the word is ambiguous or non-deterministic (can have many morpheme forms depending on the context) the application returns the first result which (almost) invariably is an exact replica of the presenting compound word. For example: ‘testing’ returns ‘testing’.
As a consequent to these limitations not all the correct morphemes are generated within the output. This is however the lesser of two evils as the use of a context free morphological analyser that returns all the morphological forms of the presenting word would incorrectly increase the occurrences of each word in our calculations!
The role of simplification is to reduce the number of words to be included in the model by removing all the non-useful stop words. This is useful in that it reduces the number of words to process to only meaningful words, i.e. words one might be interested in as part of a query. Considering we want to process the document to establish the frequency of the words contained in the text, it makes sense to remove words such as ‘and’, ‘the’ and perhaps single alphabetic characters like ‘a’, ‘i’ and ‘s’ which does not add to the semantic value of the document, however are likely to have the highest frequency within the document.
Thus from the above example the simplification process would return the string of words excluding ‘w([s])’ and ‘w([t,h,e])’.
[ mike, dog, buried]
The frequency calculations computed within w.pl is the heart of the experiment. These are done primarily by the unigram, bigram and trigram predicates. Each of these processes a list of key words that has been distilled by the preceding procedures and counts each occurrence of each word, each pair of words and each triplet presented. By doing so and ultimately arranging each of these as a list in descending frequency order, one is able to identify the most frequently occurring words or sequences of words. Although the actual probabilities are not of importance for this experiment, one can (as usual) estimate a probability as the frequency of a word sequence divided by the total number of occurrences of all sequences of that length. So for single word occurrence, the frequency of each word is stored in the predicate
ug(Word, Frequency)
and the predicate
ug(Count),
where Count is initially zero, holds the total number of words encountered. So the probability of occurrence of a word is its relative frequency Frequency / Count.
Similarly predicates bg(Countbg) and bg(Word1,Word2,Frequencybg) are used remember the number of times sequences (ordered pairs) of length two occur. The probability P(wi+1|wi) calculated by the bigram model for each sequence of words (wiwi+1), as explained in Russell and Norvig (p.835), is then given by the same formula Frequencybg / Countbg. The trigram model similarly assigns a probability based on the previous two words. In this way, in general, n-gram document models can be constructed for sequences of length n.
It is important to note here that the ordering of the words in each list is significant to the calculation as we are aiming at determining the frequency of certain words in a particular order. Thus ‘cat, walk’ and ‘walk, cat’ are regarded as different lists and each may have very different frequencies of occurrence.
Theoretically the frequency of each word or list of words should be divided by the product of the total number of words or list of words within the text to produce a probability. However, since the total number of words and words list would remain the same for all words and list of words and because calculating the probabilities would result in cumbersomely small figures (possibly resulting in floating point underflow), it is equivalent to sort based on the frequencies alone.
To find a good description of the contents of a document several possibilities exist. This can be done in a number of different ways by utilising the highest frequency words and sequences, for example, start with the highest frequency unigram, connect that to the most likely following bigram, and so on:
ug as [word1]
bg as [word1, word2] based on ug at the start
tg as [word1,word2,word3] based on bg at the start
ng as [word1,word2,…wordn] based on n-1g at the start
(This would be a depth-first search of the possible search space of possible sentences)
Similarly, use only bigrams
bg as (word1-1,word1] based on ug in 2nd place
tg as (word1-1,word1,word2] based on ug in 3rd place
(A variation on a depth-first search).
To evaluate the program, I used a document obtained from the internet at http://www.ejch.com/emdr.htm. This is a sample article published by the European Journal of Clinical Hypnosis on Eye Movement Desensitisation and Reprocessing (EMDR) as a therapeutic technique to help patients to reprocess disturbing memories. (See Appendix C)
Two binary valued input parameters to the program provides four different analyses of the document:
|
Do morphological analysis |
Remove stop words |
Effect |
|
No |
No |
1. Just analyse the words |
|
No |
Yes |
2. Remove stop words |
|
Yes |
No |
3. Retain stop words and morphemes |
|
Yes |
Yes |
4. Remove stop words, keep morphemes |
The results are summarized here. More detail is available in Appendix B.
Option 1: The file contained 6481 tokens, of which 5130 were recognized as words after lexical analysis. That means that 5130 unigrams, 5129 bigrams and 5128 trigrams were found. The ten highest frequency words were:
ug(the,326)
ug(of,204)
ug(and,139)
ug(in,133)
ug(a,127)
ug(to,110)
ug(emdr,79)
ug(with,64)
ug(amp,61)
ug(is,58)
Thus the first meaningful word (an abbreviation, emdr) occurred in seventh place only. The most frequent phrases for bigrams and trigrams are given in appendix B. Sequences of length two occurring most often are “of the” (44 times), followed by “in the” (36 times). Note the dramatic reduction in frequency compared to unigrams where the most frequent word occurred 326 times. The reduction is for trigrams is also dramatic: The phrase “the treatment of” has the highest frequency (11), followed by “as well as” (9). As one can see, the word “treatment” is already indicative of content, but the second phrase is not particularly informative.
Inspection of results from generatesentences shows that “the patient’s” is the most likely phrase generated, starting from the highest frequency unigram (the, 326), bigram and trigram found. Likely words preceding this phrase are “processing of the” to “connect” again with “the,” and “one of the.”
[w123,the,patient,s,326,23,3,22494,134926756560]
[w541,processing,of,the,3,44,326,43032,134926756560]
[w541,one,of,the,2,44,326,28688,134926756560]
Second most likely phrase:
[w123,the,patient,has,326,23,2,14996,134926756560]
[w541,processing,of,the,3,44,326,43032,134926756560]
Option 2: After simplification, i.e. removal of stop words, 2981 words (vs. 5130 before) were analyzed, 2980 bigrams and 2979 trigrams. The ten highest frequency words were:
ug(emdr,79)
ug(amp,61)
ug(treatment,49)
ug(patients,33)
ug(shapiro,32)
ug(patient,26)
ug(memory,25)
ug(processing,23)
ug(traumatic,22)
ug(memories,19)
Appendix B shows how quickly these numbers are reduced for bigrams and trigrams.
The most likely phrases are listed below.
[w123,emdr,treatment,emdr,79,7,1,553,26463589020]
[w541,patients,treated,emdr,2,5,79,790,26463589020]
[w541,outpatients,treated,emdr,1,5,79,395,26463589020]
[w541,successfully,treated,emdr,1,5,79,395,26463589020]
[w541,victims,treated,emdr,1,5,79,395,26463589020]
[w123,emdr,treatment,therapist,79,7,1,553,26463589020]
[w541,first,phase,emdr,1,3,79,237,26463589020]
[w541,final,phase,emdr,1,3,79,237,26463589020]
[w541,desensitization,phase,emdr,1,3,79,237,26463589020]
[w541,positive,effects,emdr,2,2,79,316,26463589020]
[w123,emdr,eye,movement,79,3,3,711,26463589020]
From the commandline one can construct queries to start with the most likely unigram and then use bigrams only, giving “emdr treatment plan” 79 7 6 3318 and, with the second highest frequency “traumatic memories” followed by “therapeutic.”
?- bg(traumatic,W2,F1),bg(W2,W3,F2).
F1 = 9,
F2 = 1,
W2 = memories,
W3 = therapeutic ? ;
F1 = 9,
F2 = 1,
W2 = memories,
W3 = method ? ;
F1 = 9,
F2 = 1,
W2 = memories,
W3 = psychologic
? ;
F1 = 9,
F2 = 1,
W2 = memories,
W3 = areas ? ;
F1 = 9,
F2 = 1,
W2 = memories,
W3 = normal ?
2nd highest freq:
?- bg(traumatic,W2,F1),bg(W1,traumatic,F2).
F1 = 9,
F2 = 3,
W1 = post,
W2 =
memories ?
Option 3: After morphological analysis 5203 (vs. 5130) “words” resulted because new “words,” such as “-(past),” were introduced to indicate the past tense of verbs. The highest frequency unigrams found were the same as for option 1. Since one can only use one morphological analysis result (and not all alternative analyses because this would cause incorrect counting), the results were not very interesting. A stemming algorithm to remove suffixes would have performed better.
Phrases are the same as those of option 1:
[w123,the,patient,s,326,23,3,22494,140770297206]
[w541,processing,of,the,3,44,326,43032,140770297206]
[w541,one,of,the,2,44,326,28688,140770297206]
Option 4: The results were the same as for option 2, i.e. morphological analysis in this case did not help much. Detailed results are in appendix B.
Likely phrases are listed below.
[w123,emdr,treatment,emdr,79,7,1,553,26463589020]
[w541,patients,treated,emdr,2,5,79,790,26463589020]
[w541,outpatients,treated,emdr,1,5,79,395,26463589020]
[w541,successfully,treated,emdr,1,5,79,395,26463589020]
[w541,victims,treated,emdr,1,5,79,395,26463589020]
[w541,first,phase,emdr,1,3,79,237,26463589020]
[w541,final,phase,emdr,1,3,79,237,26463589020]
[w541,desensitization,phase,emdr,1,3,79,237,26463589020]
[w541,positive,effects,emdr,2,2,79,316,26463589020]
[w541,controlled,studies,emdr,1,2,79,158,26463589020]
[w123,emdr,treatment,therapist,79,7,1,553,26463589020]
[w123,emdr,treatment,assessment,79,7,1,553,26463589020
[w541,patients,treated,emdr,2,5,79,790,26463589020
An example of another query:
| ?- ug(eye,F0),bg(eye,W2,F1),bg(W2,W3,F2),bg(W3,W4,F3).
F0 = 14,
F1 = 8,
F2 = 2,
F3 = 2,
W1 = eye,
W2 = movements,
W3 = initiated,
W4 = client ? ;
F0 = 14,
F1 = 8,
F2 = 2,
F3 = 1,
W1 = eye,
W2 = movements,
W3 = initiated,
W4 = process ? ;
F0 = 14,
F1 = 8,
F2 = 1,
F3 = 1,
W1 = eye,
W2 = movements,
W3 = studying,
W4 = process ? ;
F0 = 14,
F1 = 8,
F2 = 1,
F3 = 1,
W1 = eye,
W2 = movements,
W3 = initially,
W4 = considered ? ;
F0 = 14,
F1 = 8,
F2 = 1,
F3 = 1,
W1 = eye,
W2 = movements,
W3 = initially,
W4 = emdr ?
yes
The one drawback of eliminating special characters like full-stops s(‘.’) amongst others, is that one loses the end of sentence information which arguably reduces the semantic interpretability as sentence units seem to be intrinsic to semantic meaning and counts of sequences should probably not extend over sentence boundaries.
Similarly, possessive ‘s (e.g. as in user’s) also have a semantic significance within text and by removing them arguably reduce the meaningfulness of text.
As I have alluded to above, this application can be extended to compute any n-gram, given n as input parameter. The problem with this is that the greater the number of words in an n–gram the lower the potential frequency of their occurrence.
It should also be highlighted that the terms ug(word,freq), bg(word1,word2,freq), and tg(word1,word2,word2,freq) are the data structures (knowledge structures) to store the frequency counts, and thus the building blocks of the knowledge representational system utilised within this application. These could be normalised into a generic data structure or predicate ng (n-gram) where ng takes the form of ng([List of n words], freq) and n is determined by the number of words in the list.
However, although this would result in a more uniform knowledge representation system, this would not be the most efficient methodology to use as using a term is faster for indexing, but requires a new term for each sequence length. More importantly, using a list is much more difficult to program for the general case.
In this application I have used the first two examples illustrated above in the generatesentences predicate as examples of how this would be done. However, the second half of the predicate combined with an inevitable ‘fail’ resulting in backtracking all the possible variations are written to file. Processing power and disk space allowing, one could implement a sort function that sorts this output in descending order of frequency and thus allow easier analysis of the most likely sentences.
Another way to limit the output would be to use a cut ‘!’ ‘fail’ combination at the end of the generatesentences predicate as indicated in the program comments. This would ensure only the first run of generatesentences executes and thus allows us to easily evaluate only the sentences generated down the first arm of the search tree.
Having completed the experiment I can now conclude that probabilistic language models are not the most effective and most accurate language models to use for semantic analysis of documents. However, I do concede that when one is analysing large corpa of documents like the World Wide Web, probabilistic language models offer the most time and resource efficient mechanism to satisfy a potential user. The results here still need refinement, but they do extract words and phrases which are strongly related to the true content of the document.
The development lifecycle of this application has also indicated that it lends itself to progressive refinement and improvement over time and further development and test iterations.
Covington, M.A.: Natural Language Processing fro Prolog Programmers Prentice Hall Inc. Englewood Cliffs, New Jersey, 1994, p.1-35
Covington, M. A. ET: An Efficient Tokenizer in ISO Prolog, 2003, http://www.ai.uga.edu/mc/
Covington, M. A. ‘Et.pl’, 2003, http://www.ai.uga.edu/mc/
Schlachter, J. G. ProNTo_Morph: Morphological Analysis Tool for use with ProNTo (Prolog Natural Language Toolkit), 2003, http://www.ai.uga.edu/mc/
Schlachter, J. G. ‘pronto_morph_engine.pl’ and assoc. programmes, 2003, http://www.ai.uga.edu/mc/
Russell, S. & P. Norvig: Artificial Intelligence: A Modern Approach 2nd ed. Pearson Education Inc. New Jersey, 2003, p. 834-836, 840-846
/******************************************************/
%
% PROGRAM: w.pl
% DATE: 22/12/2003
% AUTHOR: Wollie Boehm
%
/******************************************************/
% env modules for
SICSTUS
:-
prolog_flag(language,_,iso).
:-
use_module(library(system)).
:-
use_module(library(lists)).
%
use_module(library(system),[]),system:working_directory(_,'C:/My
Documents/Work/AI 1/Project/Ir').
:-
ensure_loaded('et.pl'). %
Efficient Tokenizer (author: MC)
:-
ensure_loaded('pronto_morph_engine.pl'). %
(author: JS) !!! REMEMBER name clash with suffix/2 !!! (skip)
:-
ensure_loaded('sr.pl'). % Simplification Rules (author: WB)
/******************************************************/
%
% Flags to run Morph analysis, Simplify and
specify if the input file is an HTML file
% +yes/anythingbutyes, +yes/anythingbutyes,
+yes/anythingbutyes,
% +Name of file for input text, +File for
results
%
run(Morphanal,Simplif,Htmlfiletype,Inputfilename,Outputfilename)
:-
% Can accept any input or output file
specified in query
tokenize_file(Inputfilename,Tokens), % Use character classes for lexical
analysis in 'et.pl'
tell(Outputfilename), % Open Output File
write('Number of tokens: '),
length(Tokens,N),
write(N), nl,
write(Tokens), % Form [class([l,i,s,t],class[t,o,k,e,n])
of tokens
nl, nl,
((Htmlfiletype == yes) -> % is this an HTML file?
removehtmltokens(Tokens,Newtokenlist); % strip all < HTMLkeyword > or </
HTMLkeyword >
Newtokenlist = Tokens),
tokens_words(Newtokenlist,Listofwords), % Keep the words, throw away special
symbols, etc.
write('Number of words: '),
length(Listofwords,N1),
write(N1), nl, % N1 is Number of
words
write(Listofwords), % Form
[List,of,words]
nl, nl,
((Morphanal == yes) -> % IF Input param for
Morphanalysis = Yes
morphemes(Listofwords,Result1); % THEN run morphemes +Listofwords
-Result1
Result1=Listofwords), % ELSE unify Result1 with
Listofwords
((Simplif
== yes) -> %
IF Input param for Simplify = Yes
simplify(Result1,Result2); % THEN run simplify +Result1
-Result2
Result2=Result1), % ELSE unify Result2
with Result1
write('Number of words for
uni-,bi-,trigrams: '),
length(Result2,N2),
write(N2), % Number of words left after morphanal
&/ simplify
nl,
write(Result2), % Output the list of words after
morphanal &/ simplify
nl, nl,
unigram(Result2), % Calculates the frequency of unigrams
bigram(Result2), % Calculates the frequency of bigrams
trigram(Result2), % Calculates the frequency of trigrams
saveubtgrams, % Saves uni, bi, tri grams to file
to process corpus of many files
generatesentences, % Attempts to generate sentences
told. %
Closes the output file
/******************************************************/
% cleardata/0
% Utility procedure
to remove the words and frequency data
/******************************************************/
cleardata :-
retract( ug(_) ), % Forget the old total
assertz( ug(0) ),
retractall(ug(_,_)), % Remember nothing
retract( bg(_) ), % Forget the old total
assertz( bg(0) ),
retractall(bg(_,_,_)), % Remember nothing
retract( tg(_) ), % Forget the old total
assertz( tg(0) ),
retractall(tg(_,_,_,_)). % Remember nothing
/******************************************************/
%
%
removehtmltokens(+ListofTokens,-Newtokenlist)
% strip all <
HTMLkeyword > or < /HTMLkeyword > from the input list
% the tokenizer
recognizes the special symbols < and > that surround HTML keywords
% Use that to strip
all such sequences of 3 or 4 "words" first,
% and if that does
not work, then remove the list of symbols between < and >
%
/******************************************************/
removehtmltokens([],[]). % stop condition
% two procedures
which check for the most common HTML token sequences
% (strictly
speaking, not necessary, since the final one includes both of these special
cases)
removehtmltokens([s('<'),_,s('>')|Restlist],Newtokenlist)
:-
!,
removehtmltokens(Restlist,Newtokenlist).
removehtmltokens([s('<'),s('/'),_,s('>')|Restlist],Newtokenlist)
:-
!,
removehtmltokens(Restlist,Newtokenlist).
% Match a list of
symbols between '<' and '>' where
'>' is the end symbol of the HTML "token"
removehtmltokens([s('<')|Restlist],Newtokenlist)
:-
insidehtmltokens(Restlist,Listafterendsymbol),
!,
removehtmltokens(Listafterendsymbol,Newtokenlist).
% if the above do
not match an HTML token, then it is a single symbol, so keep it
removehtmltokens([Head|Restlist],[Head|Restnewtokenlist])
:-
!,
removehtmltokens(Restlist,Restnewtokenlist).
/******************************************************/
% insidehtmltokens/2
/******************************************************/
insidehtmltokens([s('>')|Restlist],Restlist). % found end symbol, i.e. >
insidehtmltokens([_|Restlist],Listafterendsymbol)
:- % remove head
insidehtmltokens(Restlist,Listafterendsymbol). % then check again if > found
/******************************************************/
%
%
simplify(+List,-Result)
% Applies
simplification rules to list giving result.
%
/******************************************************/
simplify([],[]). % Stopping condition
simplify(List,Result):- % A simple rule matches of
simplification rules
sr(List,NewList), % so apply it and then
!, %
cut to avoid backtracking and
simplify(NewList,Result). % try to further simplify the result
simplify([W|Words],[W|NewWords]):- % No simplification rules match
simplify(Words,NewWords). % so advance to the next word
/******************************************************/
%
%
morphemes(+Listofwords,-Outputlist)
%
/******************************************************/
morphemes(Listofwords,Outputlist)
:- % Map words to morphemes
(basic semantic units)
morph_atoms(Listofwords,Morphemelist), %
Output from morph_atoms is a list of lists
flattenlist(Morphemelist,Outputlist). % Reduce list of lists to a simple list
/******************************************************/
%
flattenlist(+[Head1,Head2|Tail],-Outputlist)
/******************************************************/
flattenlist([Head1,Head2|Tail],Outputlist)
:- % Both Heads are lists, so just
combine them
lists:append(Head1,Head2,Newhead), % Call append from module lists: +Head1
+Head2 -Newhead
!,
flattenlist([Newhead|Tail],Outputlist). % Repeat until we have exactly one list inside
another
flattenlist([[Head|Tail]],[Head|Tail]). % Reduce list of lists to a simple list
flattenlist([[]],[]).
% Stopping
Condition
/******************************************************/
%
% unigram
% Computes unigrams:
frequency of occurrence in total...
%
/******************************************************/
:- dynamic ug/2,
ug/1. % predicates need
to be defined as dynamic for assert & retract
% Record the total
number of words encountered to compute estimated probabilities
ug(0). % Counter:
when processing multiple files, this may be non-zero
unigram(List) :- % :List of words to
compute unigram frequency of occurrence
% List
instantiated from Result2
ug(TotalSoFar), % Remember the previous total number of words
found
unigram(List,TotalSoFar). % Process this list, given the total
words seen prev.
unigram([],
TotalWords) :- % End of
list with total number of words processed
retract( ug(OldTotal) ), % Forget the old total
assertz( ug(TotalWords) ), % Remember the new total
write('Number of unigrams found: '),
write(TotalWords),
nl,
findall(F-ug(W,F),ug(W,F),Uglist), % Prepare a list to sort from low to high freq
%
findall(?Template,:Goal,?Bag)
% All
vars are Existential
%
'Frequency'-ug is the form expected by keysort
%
>> succeeds only once
write('Unsorted list: '),
write(Uglist),
nl,nl,
keysort(Uglist,SortedUglist), % Sort orders from low to high
%
keysort(+List1,?List2)
% List1
expected in the form Key-Value
write('Sorted list: '),
write(SortedUglist),
nl,nl,
retractall(ug(_,_)), % Erases all ug/2 as the frequencies
should be from high to low
%
retractall(:Head)
% ug must be
dynamiclly defined
restoreterms(SortedUglist). % Assert list in descending order
unigram([Word|Restlist],Total)
:- % Count word occurrences
existsug(Word,Freq), % Encountered this word before?
Newfreq is Freq + 1, % Update freq count of word
Newtotal is Total + 1, % Update total number of different words
so far
asserta( ug(Word,Newfreq) ), % Assert the unigram and new freq to database
!, %
Avoid backtracking
unigram(Restlist,Newtotal). % Do the rest of the list
/******************************************************/
% existsug(+Word,+Freq)
% Checks to see if
word found while counting unigram freqency to increment totals
/******************************************************/
existsug(Word,Freq)
:- % If word found,
then return its prev count
% write('ug exists '), %
For own debugging (Can be deleted)
% write(Word),
% nl,
clause( ug(Word,Freq), true ), %
Does ug(Word,Freq) exist? Will only unify if yes
retract( ug(Word,Freq) ), % If eixts then erase old unigram
& freq so new unigram freq can be asserted
!. % no backtracking for this predicate
existsug(_,0). %
Not found, then count is zero
/******************************************************/
% restoreterms
% Asserts a list in
reverse order
/******************************************************/
restoreterms([]). % Finished the
list of entries Number-Someterm
restoreterms([_Number-Term|Restsortedlist])
:- % So assert them in reverse order
asserta(Term),
!,
restoreterms(Restsortedlist).
/******************************************************/
%
% bigram
% Computes bigrams:
frequency of occurrence in total...
%
/******************************************************/
:- dynamic bg/3,
bg/1. % predicates need to be
defined as dynamic for assert & retract
% Record the total
number of words encountered to compute estimated probabilities
bg(0). % Counter: when
processing multiple files, this may be non-zero
bigram(List) :- % :List of words to compute
bigram frequency of occurrence
bg(TotalSoFar), % Remember the previous total number of words found
bigram(List,TotalSoFar). % Process this list, given the total words
seen prev.
bigram([_],
TotalWords) :- % End of list with
total number of words processed
retract( bg(OldTotal) ), % Forget the old total
assertz( bg(TotalWords) ), % Remember the new total
write('Number of bigrams found: '),
write(TotalWords),
nl,
findall(F-bg(W1,W2,F),bg(W1,W2,F),Bglist), % Prepare a list to sort from low to high
freq
%
findall(?Template,:Goal,?Bag)
%
All vars are Existential
%
'Frequency'-bg is the form expected by keysort
%
>> succeeds only once
write('Unsorted list: '),
write(Bglist),
nl,nl,
keysort(Bglist,SortedBglist), % Sort orders from low to high
%
keysort(+List1,?List2)
% List1
expected in the form Key-Value
write('Sorted list: '),
write(SortedBglist),
nl,nl,
retractall(bg(_,_,_)), % Erases all bg/2 as the
frequencies should be from high to low
%
retractall(:Head)
% bg
must be dynamiclly defined
restoreterms(SortedBglist). % So assert them in descendings order
bigram([Word,Word2|Restlist],Total)
:- % Count bigram occurrences
existsbg(Word,Word2,Freq), % Encountered this bigram pair before?
Newfreq is Freq + 1, % Update freq count of bigram
pair
Newtotal is Total + 1, % Update total number of different
bigrams so far
asserta( bg(Word,Word2,Newfreq) ), % Assert the bigram and new freq to database
!, %
Avoid backtracking
bigram([Word2|Restlist],Newtotal). % Do the rest of the list
/******************************************************/
%
existsbg(+Word,+Word2,+Freq)
% Checks to see if
word pair found while counting bigram freqency to increment totals
/******************************************************/
existsbg(Word,Word2,Freq)
:- %
If bigram found then return its freq count
% write('bg exists '), % For own debugging (can be deleted)
% write(Word),
% write(' '),
% write(Word2),
% nl,
clause( bg(Word,Word2,Freq), true ), % Does clause bg(Word,Word2,Freq) exist?
%
Will only unify if yes
retract( bg(Word,Word2,Freq) ), % If it eixts then erase old bigram
and freq
%
so new bigram freq can be asserted
!. %
Avoid backtracking
existsbg(_,_,0). %
Bigram not found then set count to zero
/******************************************************/
% trigram
% Computes trigrams:
frequency of occurrence in total...
/******************************************************/
:- dynamic tg/4,
tg/1. % predicates need
to be defined as dynamic for assert & retract
% Record the total
number of words encountered to compute estimated probabilities
tg(0). % Counter:
when processing multiple files, this may be non-zero
trigram(List) :- % :List of words to
compute trigram frequency of occurrence
tg(TotalSoFar), % Remember the previous total number of words
found
trigram(List, TotalSoFar). % Process this list, given the total
words seen prev.
trigram([_,_],
TotalWords) :-
retract( tg(OldTotal) ), % Forget the old total
assertz( tg(TotalWords) ), % Remember the new total
write('Number of trigrams found: '),
write(TotalWords),
nl,
findall(F-tg(W1,W2,W3,F),tg(W1,W2,W3,F),Tglist), % Prepare a list to sort from low to high
freq
%
findall(?Template,:Goal,?Bag)
%
All vars are Existential
%
'Frequency'-tg is the form expected by keysort
%
>> succeeds only once
write('Unsorted list: '),
write(Tglist),
nl,nl,
keysort(Tglist,SortedTglist), % Sort orders from low to high
%
keysort(+List1,?List2)
%
List1 expected in the form Key-Value
write('Sorted list: '),
write(SortedTglist),
nl,nl,
retractall(tg(_,_,_,_)), % Erases all tg/2 as the frequencies
should be from high to low
%
retractall(:Head)
% tg
must be dynamiclly defined
restoreterms(SortedTglist). % So assert them in this order
trigram([Word,Word2,Word3|Restlist],Total)
:- % Count trigram occurrences
existstg(Word,Word2,Word3,Freq), % Encountered this trigram
before?
Newfreq is Freq + 1, % Update freq count
of trigram pair
Newtotal is Total + 1, % Update total number
of different trigrams so far
asserta( tg(Word,Word2,Word3,Newfreq) ), % Assert the trigram and new freq to
database
!, %
Avoid Backtracking
trigram([Word2,Word3|Restlist],Newtotal). % Do the rest of the list
/******************************************************/
%
existstg(+Word,+Word2,+WOrd3,+Freq)
% Checks to see if
word pair found while counting trigram freqency to increment totals
/******************************************************/
existstg(Word,Word2,Word3,Freq)
:- % If trigram found
then return its freq count
% write('tg exists '), % For own debugging (can be deleted)
% write(Word),
% write(' '),
% write(Word2),
% write(' '),
% write(Word3),
% nl,
clause( tg(Word,Word2,Word3,Freq), true ), % Does clause tg(Word,Word2,Word3,Freq)
exist?
%
Will only unify if yes
retract( tg(Word,Word2,Word3,Freq) ), % If it eixts then erase old trigram
and freq
%
so new trigram freq can be asserted
!. %
Avoid backtracking
existstg(_,_,_,0). %
Trigram not found then set count to zero
/******************************************************/
%
% saveubtgrams/0
% save the data of
uni-, bi-, trigrams to a file which can be
% consulted again
%
/******************************************************/
saveubtgrams :-
ug(Total1), % recall total num unigrams
counted in Inputfile from ug/1
write( ug(Total1) ), nl,
findall(ug(W,F),ug(W,F),Uglist), % Collect unigrams in a list
writelistelems(Uglist),
bg(Total2), % recall total num bigrams
counted in Inputfile from bg/1
write( bg(Total2) ), nl,
findall(bg(W,W2,F),bg(W,W2,F),Bglist), % Collect bigrams in a list
writelistelems(Bglist),
tg(Total3), % recall total num trigrams
counted in Inputfile from tg/1
write( tg(Total3) ), nl,
findall(tg(W,W2,W3,F),tg(W,W2,W3,F),Tglist), % Collect trigrams in a list
writelistelems(Tglist).
/******************************************************/
% writelistelems/1
% Write elements of
list to the current output file
/******************************************************/
writelistelems([])
:- % Stopping condition
nl.
writelistelems([Term|Restlist])
:-
write(Term),nl,
!,
writelistelems(Restlist).
/******************************************************/
%
%
generatesentences/0
% Compute some of
the most likely sentences
% this predicate
could be extended...
%
/******************************************************/
generatesentences :-
ug(T1), %
Retrieve totals for word sequences
bg(T2),
tg(T3),
nl,nl,nl,
%
Actually, prob. is F1*F2*F3 / (T1*T2*T3),
Divisor is (T1*T2*T3), % but the numbers become very
small
% for words in sequence 1 2 3 with word1
the initiator of the sequence, i.e. words following word1
ug(Word1,Freq1), % Highest freq word found
bg(Word1,Word2,Freq2), % Find bg with highest freq
where word1 is highest frequency ug
tg(Word1,Word2,Word3,Freq3), % Find tg with highest freq where
word1 is highest frequency ug
%
and Word1,Word2 is the highest frequency of bg found given ug
%
Since ug, bg and tg are asserted in descending order the 1st
%
predicate found that unifies should also be the highest freq
Probability1 is Freq1 * Freq2 * Freq3, % Actually, prob. is F1*F2*F3 / (T1*T2*T3)
write([w123,Word1,Word2,Word3,Freq1,Freq2,Freq3,Probability1,Divisor]),
nl,
% for words in sequence 5 4 1 with word1
the initiator of the sequence, i.e. words preceding word1
bg(Word4,Word1,Freq4), % Find the highest freq bg
where the 2nd word in the sequence
%
is the higest frequency word found in ug above.
tg(Word5,Word4,Word1,Freq5), % Find the highest freq tg where
the 2nd and 3rd words in the sequence
%
are the higest frequency word sequence found in bg above, give ug
Probability2 is Freq1 * Freq4 *
Freq5, %
Actually, prob. is F1*F4*F5 / (T1*T2*T3)
write([w541,Word5,Word4,Word1,Freq5,Freq4,Freq1,Probability2,Divisor]),nl,
fail. %
Backtracks, so gives output of all possible solutions
%
If replaced with '!, fail.' would prevent the backtracking
%
and present only the likely sentences generated by a single pass through
generatesentences. % Assures program
success
/******************************************************/
% PROGRAM: sr(+W1,-T1)
% Author: Wollie Boehm
% Date: 22/12/03
% Simplification
Rules.
% This list of rules
can be expanded and refined...
/******************************************************/
/******************************************************/
% A
/******************************************************/
sr([a|X],X). % Splits the list into head &
tail,
sr([able|X],X). % removes the head, by instantiating
only tail
sr([about|X],X).
sr([after|X],X).
sr([al|X],X).
sr([all|X],X).
sr([also|X],X).
sr([am|X],X).
sr([an|X],X).
sr([and|X],X).
sr([any|X],X).
sr([are|X],X).
sr([as|X],X).
sr([at|X],X).
/******************************************************/
% B
/******************************************************/
sr([back|X],X).
sr([be|X],X).
sr([because|X],X).
sr([became|X],X).
sr([become|X],X).
sr([been|X],X).
sr([begin|X],X).
sr([began|X],X).
sr([best|X],X).
sr([between|X],X).
sr([bring|X],X).
sr([but|X],X).
sr([by|X],X).
/******************************************************/
% C
/******************************************************/
sr([can|X],X).
sr([case|X],X).
sr([come|X],X).
sr([could|X],X).
/******************************************************/
% D
/******************************************************/
sr([do|X],X).
sr([did|X],X).
sr([due|X],X).
sr([during|X],X).
/******************************************************/
% E
/******************************************************/
sr([et|X],X).
sr([ever|X],X).
/******************************************************/
% F
/******************************************************/
sr([fine|X],X).
sr([for|X],X).
sr([from|X],X).
sr([further|X],X).
/******************************************************/
% G
/******************************************************/
sr([gave|X],X).
sr([give|X],X).
sr([given|X],X).
sr([go|X],X).
sr([gone|X],X).
sr([good|X],X).
/******************************************************/
% H
/******************************************************/
sr([had|X],X).
sr([has|X],X).
sr([have|X],X).
sr([he|X],X).
sr([her|X],X).
sr([hers|X],X).
sr([his|X],X).
sr([however|X],X).
/******************************************************/
% I
/******************************************************/
sr([i|X],X).
sr([if|X],X).
sr([in|X],X).
sr([into|X],X).
sr([is|X],X).
sr([it|X],X).
sr([its|X],X).
/******************************************************/
% K
/******************************************************/
sr([known|X],X).
/******************************************************/
% L
/******************************************************/
sr([later|X],X).
sr([like|X],X).
/******************************************************/
% M
/******************************************************/
sr([may|X],X).
sr([many|X],X).
sr([mere|X],X).
sr([met|X],X).
sr([more|X],X).
sr([most|X],X).
sr([much|X],X).
/******************************************************/
% N
/******************************************************/
sr([never|X],X).
sr([no|X],X).
sr([non|X],X).
sr([none|X],X).
sr([not|X],X).
sr([now|X],X).
/******************************************************/
% O
/******************************************************/
sr([of|X],X).
sr([often|X],X).
sr([on|X],X).
sr([only|X],X).
sr([or|X],X).
sr([other|X],X).
sr([others|X],X).
sr([own|X],X).
sr([over|X],X).
/******************************************************/
% P
/******************************************************/
sr([provide|X],X).
/******************************************************/
% Q
/******************************************************/
sr([range|X],X).
sr([recent|X],X).
/******************************************************/
% S
/******************************************************/
sr([same|X],X).
sr([see|X],X).
sr([seem|X],X).
sr([seemed|X],X).
sr([she|X],X).
sr([show|X],X).
sr([simple|X],X).
sr([since|X],X).
sr([some|X],X).
sr([sometimes|X],X).
sr([such|X],X).
/******************************************************/
% T
/******************************************************/
sr([than|X],X).
sr([that|X],X).
sr([the|X],X).
sr([their|X],X).
sr([them|X],X).
sr([then|X],X).
sr([there|X],X).
sr([to|X],X).
sr([took|X],X).
sr([these|X],X).
sr([they|X],X).
sr([this|X],X).
sr([those|X],X).
sr([try|X],X).
/******************************************************/
% U
/******************************************************/
sr([up|X],X).
sr([us|X],X).
sr([use|X],X).
sr([uses|X],X).
sr([used|X],X).
sr([under|X],X).
/******************************************************/
% V
/******************************************************/
sr([via|X],X).
sr([very|X],X).
/******************************************************/
% W
/******************************************************/
sr([was|X],X).
sr([want|X],X).
sr([way|X],X).
sr([well|X],X).
sr([were|X],X).
sr([what|X],X).
sr([when|X],X).
sr([where|X],X).
sr([whether|X],X).
sr([which|X],X).
sr([while|X],X).
sr([who|X],X).
sr([whose|X],X).
sr([why|X],X).
sr([wide|X],X).
sr([widely|X],X).
sr([with|X],X).
/******************************************************/
% Y
/******************************************************/
sr([yet|X],X).
sr([you|X],X).
/******************************************************/
% Additional stuff
to be removed (generated during morphol. anal.),
% so use simplify to
get rid of them as well
% (implies, first do
morph. anal, then simplify)
/******************************************************/
sr([-(sg3)|X],X).
sr([-(en)|X],X).
sr([-(past)|X],X).
sr([-(ed)|X],X).
sr([-(pl)|X],X).
sr([-(est)|X],X).
sr([-(er)|X],X).
/******************************************************/
% Some &... html
words
/******************************************************/
sr([nbsp|X],X).
sr([amp|X],X).
The results are continued here, but only partial listings are provided since it is simply too much.
Option 1: The file contained 6481 tokens, of which 5130 were recognized as words after lexical analysis. That means that 5130 unigrams, 5129 bigrams and 5128 trigrams were found.
ug(5130)
ug(the,326)
ug(of,204)
ug(and,139)
ug(in,133)
ug(a,127)
ug(to,110)
ug(emdr,79)
ug(with,64)
ug(amp,61)
ug(is,58)
ug(as,56)
ug(treatment,49)
ug(that,48)
ug(for,38)
ug(patients,33)
ug(shapiro,32)
ug(this,30)
ug(or,28)
ug(be,27)
ug(patient,26)
ug(memory,25)
ug(by,24)
ug(processing,23)
ug(are,23)
ug(traumatic,22)
ug(can,21)
ug(may,20)
ug(it,20)
ug(who,20)
ug(memories,19)
ug(positive,18)
ug(phase,18)
ug(during,18)
ug(an,18)
ug(client,16)
ug(she,16)
ug(from,16)
ug(therapy,16)
bg(5129)
bg(of,the,44)
bg(in,the,36)
bg(the,patient,23)
bg(in,press,15)
bg(of,emdr,13)
bg(treatment,of,13)
bg(and,the,13)
bg(the,client,12)
bg(with,emdr,12)
bg(per,cent,12)
bg(in,a,12)
bg(the,treatment,12)
bg(as,a,11)
bg(as,well,10)
bg(well,as,10)
bg(to,the,10)
bg(to,be,10)
bg(with,the,10)
bg(by,the,9)
bg(such,as,9)
bg(traumatic,memories,9)
bg(of,this,9)
bg(for,the,9)
bg(the,method,8)
bg(the,positive,8)
bg(number,of,8)
bg(can,be,8)
bg(the,memory,8)
bg(it,is,8)
bg(eye,movements,8)
bg(as,the,8)
bg(in,emdr,8)
bg(shapiro,amp,8)
bg(has,been,8)
bg(positive,cognition,7)
bg(see,shapiro,7)
bg(shapiro,a,7)
bg(information,processing,7)
bg(with,a,7)
tg(the,treatment,of,11)
tg(as,well,as,9)
tg(in,the,treatment,7)
tg(amp,silk,forrest,7)
tg(the,positive,cognition,6)
tg(lazrove,amp,fine,6)
tg(amp,fine,in,6)
tg(fine,in,press,6)
tg(shapiro,amp,silk,6)
tg(treated,with,emdr,5)
tg(a,number,of,5)
tg(eye,movement,desensitization,4)
tg(the,aip,model,4)
tg(in,most,cases,4)
tg(in,many,cases,4)
tg(his,or,her,4)
tg(in,press,shapiro,4)
tg(seemed,to,be,3)
tg(emdr,eye,movement,3)
tg(movement,desensitization,and,3)
tg(desensitization,and,reprocessing,3)
tg(who,are,now,3)
tg(are,now,able,3)
tg(now,able,to,3)
[w123,the,patient,s,326,23,3,22494,134926756560]
[w541,processing,of,the,3,44,326,43032,134926756560]
[w541,one,of,the,2,44,326,28688,134926756560]
[w123,the,patient,has,326,23,2,14996,134926756560]
[w541,processing,of,the,3,44,326,43032,134926756560]
Option 2: After simplification, i.e. removal
of stop words, 2981 words (vs. 5130 before) were analyzed, 2980 bigrams and
2979 trigrams.
ug(2981)
ug(emdr,79)
ug(amp,61)
ug(treatment,49)
ug(patients,33)
ug(shapiro,32)
ug(patient,26)
ug(memory,25)
ug(processing,23)
ug(traumatic,22)
ug(memories,19)
ug(positive,18)
ug(phase,18)
ug(client,16)
ug(therapy,16)
ug(reprocessing,15)
ug(trauma,15)
ug(press,15)
ug(clinical,15)
ug(method,14)
ug(eye,14)
ug(ptsd,13)
ug(cognition,13)
ug(cases,13)
ug(per,12)
ug(cent,12)
ug(studies,12)
ug(information,12)
ug(desensitization,11)
ug(one,11)
ug(therapeutic,11)
ug(controlled,10)
ug(victims,10)
ug(negative,10)
ug(process,10)
ug(session,10)
ug(sessions,10)
ug(effects,9)
ug(methods,9)
ug(dissociative,9)
ug(self,9)
ug(emotions,9)
ug(emotional,9)
ug(related,9)
ug(disorders,9)
ug(found,9)
ug(disorder,9)
ug(body,8)
ug(number,8)
ug(movements,8)
ug(s,8)
bg(2980)
bg(per,cent,12)
bg(traumatic,memories,9)
bg(eye,movements,8)
bg(shapiro,amp,8)
bg(positive,cognition,7)
bg(information,processing,7)
bg(emdr,treatment,7)
bg(amp,silk,7)
bg(silk,forrest,7)
bg(lazrove,amp,6)
bg(amp,press,6)
bg(emdr,therapy,6)
bg(treatment,plan,6)
bg(eye,movement,5)
bg(re,evaluation,5)
bg(treated,emdr,5)
bg(traumatised,patients,5)
bg(movement,desensitization,4)
bg(shapiro,b,4)
bg(aip,model,4)
bg(press,shapiro,4)
bg(emdr,eye,3)
bg(desensitization,reprocessing,3)
bg(treatment,sessions,3)
bg(controlled,studies,3)
bg(brown,mcgoldrick,3)
bg(mcgoldrick,amp,3)
bg(amp,buchanan,3)
tg(2979)
tg(amp,silk,forrest,7) Note the low frequencies!
tg(lazrove,amp,press,6)
tg(shapiro,amp,silk,6)
tg(eye,movement,desensitization,4)
tg(emdr,eye,movement,3)
tg(movement,desensitization,reprocessing,3)
tg(brown,mcgoldrick,amp,3)
tg(mcgoldrick,amp,buchanan,3)
tg(post,traumatic,stress,3)
tg(traumatic,stress,disorder,3)
tg(accelerated,information,processing,3)
tg(sets,eye,movements,3)
tg(van,der,kolk,3)
tg(shapiro,shapiro,amp,3)
tg(information,processing,system,3)
tg(paulsen,lazrove,amp,3)
tg(per,cent,per,3)
tg(cent,per,cent,3)
tg(positive,effects,emdr,2)
[w123,emdr,treatment,emdr,79,7,1,553,26463589020]
[w541,patients,treated,emdr,2,5,79,790,26463589020]
[w541,outpatients,treated,emdr,1,5,79,395,26463589020]
[w541,successfully,treated,emdr,1,5,79,395,26463589020]
[w541,victims,treated,emdr,1,5,79,395,26463589020]
[w123,emdr,treatment,therapist,79,7,1,553,26463589020]
[w541,first,phase,emdr,1,3,79,237,26463589020]
[w541,final,phase,emdr,1,3,79,237,26463589020]
[w541,desensitization,phase,emdr,1,3,79,237,26463589020]
[w541,positive,effects,emdr,2,2,79,316,26463589020]
[w123,emdr,eye,movement,79,3,3,711,26463589020]
Using a unigram and then bigrams, “emdr treatment plan” occurs with frequency 79 7 6 3318.
Using only bigrams, we obtain per cent (12) cent per (3) (cyclic)
2nd highest frequency is “traumatic memories” (9) “therapeutic” (1)
?- bg(traumatic,W2,F1),bg(W2,W3,F2).
F1 = 9,
F2 = 1,
W2 = memories,
W3 = therapeutic ? ;
F1 = 9,
F2 = 1,
W2 = memories,
W3 = method ? ;
F1 = 9,
F2 = 1,
W2 = memories,
W3 = psychologic
? ;
F1 = 9,
F2 = 1,
W2 = memories,
W3 = areas ? ;
F1 = 9,
F2 = 1,
W2 = memories,
W3 = normal ?
2nd highest freq:
?- bg(traumatic,W2,F1),bg(W1,traumatic,F2).
F1 = 9,
F2 = 3,
W1 = post,
W2 = memories ?
Option 3: After morphological analysis 5203 (vs. 5130) “words” resulted because new “words,” such as “-(past),” were introduced to indicate the past tense of verbs. The unigrams found were almost the same as for option 1.
ug(5203)
ug(the,326)
ug(of,204)
ug(and,139)
ug(in,133)
ug(a,127)
ug(to,110)
ug(emdr,79)
ug(with,64)
ug(amp,61)
ug(is,58)
ug(as,56)
ug(treatment,49)
ug(that,48)
ug(have,43)
ug(for,38)
ug(patients,33)
ug(shapiro,32)
ug(this,30)
ug(-(past),30)
ug(or,28)
ug(be,27)
ug(patient,26)
ug(memory,25)
ug(by,24)
ug(processing,23)
ug(are,23)
ug(traumatic,22)
ug(can,21)
ug(may,20)
ug(it,20)
ug(who,20)
ug(memories,19)
ug(positive,18)
ug(phase,18)
ug(during,18)
ug(well,18)
ug(an,18)
ug(client,16)
ug(she,16)
ug(from,16)
ug(therapy,16)
ug(been,16)
ug(-(sg3),16)
ug(her,15)
ug(reprocessing,15)
ug(trauma,15)
ug(other,15)
ug(press,15)
ug(clinical,15)
ug(method,14)
ug(eye,14)
bg(5202)
bg(of,the,44)
bg(in,the,36)
bg(the,patient,23)
bg(have,-(sg3),16)
bg(in,press,15)
bg(of,emdr,13)
bg(have,-(ed),13)
bg(treatment,of,13)
bg(and,the,13)
bg(the,client,12)
bg(with,emdr,12)
bg(per,cent,12)
bg(in,a,12)
bg(the,treatment,12)
bg(as,a,11)
bg(as,well,10)
bg(well,as,10)
bg(to,the,10)
bg(to,be,10)
bg(with,the,10)
bg(by,the,9)
bg(such,as,9)
bg(traumatic,memories,9)
bg(of,this,9)
tg(5201)
tg(the,treatment,of,11)
tg(as,well,as,9)
tg(have,-(sg3),been,8)
tg(in,the,treatment,7)
tg(amp,silk,forrest,7)
tg(the,positive,cognition,6)
tg(lazrove,amp,fine,6)
tg(amp,fine,in,6)
tg(fine,in,press,6)
tg(shapiro,amp,silk,6)
tg(treated,with,emdr,5)
tg(a,number,of,5)
tg(eye,movement,desensitization,4)
tg(the,aip,model,4)
tg(in,most,cases,4)
tg(have,-(ed),been,4)
tg(in,many,cases,4)
tg(his,or,her,4)
tg(in,press,shapiro,4)
tg(emdr,have,-(sg3),4)
tg(seemed,to,be,3)
tg(emdr,eye,movement,3)
tg(movement,desensitization,and,3)
tg(desensitization,and,reprocessing,3)
tg(who,are,now,3)
tg(are,now,able,3)
tg(now,able,to,3)
tg(find,-(past),that,3)
tg(as,a,method,3)
tg(the,field,of,3)
tg(brown,mcgoldrick,amp,3)
tg(mcgoldrick,amp,buchanan,3)
tg(shapiro,a,b,3)
tg(of,post,traumatic,3)
tg(post,traumatic,stress,3)
tg(traumatic,stress,disorder,3)
tg(per,cent,of,3)
[w123,the,patient,s,326,23,3,22494,140770297206]
[w541,processing,of,the,3,44,326,43032,140770297206]
[w541,one,of,the,2,44,326,28688,140770297206]
Option 4: The results were basically the same as for option 2, i.e. morphological analysis in this case did not help much.
ug(2981)
ug(emdr,79)
ug(amp,61)
ug(treatment,49)
ug(patients,33)
ug(shapiro,32)
ug(patient,26)
ug(memory,25)
ug(processing,23)
ug(traumatic,22)
ug(memories,19)
ug(positive,18)
ug(phase,18)
ug(client,16)
ug(therapy,16)
ug(reprocessing,15)
ug(trauma,15)
ug(press,15)
ug(clinical,15)
ug(method,14)
ug(eye,14)
ug(ptsd,13)
ug(cognition,13)
ug(cases,13)
ug(per,12)
ug(cent,12)
ug(studies,12)
ug(information,12)
ug(desensitization,11)
ug(one,11)
ug(therapeutic,11)
ug(controlled,10)
ug(victims,10)
ug(negative,10)
ug(process,10)
ug(session,10)
ug(sessions,10)
ug(effects,9)
ug(methods,9)
ug(dissociative,9)
ug(self,9)
ug(emotions,9)
ug(emotional,9)
ug(related,9)
ug(disorders,9)
ug(find,9)
bg(2980)
bg(per,cent,12)
bg(traumatic,memories,9)
bg(eye,movements,8)
bg(shapiro,amp,8)
bg(positive,cognition,7)
bg(information,processing,7)
bg(emdr,treatment,7)
bg(amp,silk,7)
bg(silk,forrest,7)
bg(lazrove,amp,6)
bg(amp,press,6)
bg(emdr,therapy,6)
bg(treatment,plan,6)
bg(eye,movement,5)
bg(re,evaluation,5)
bg(treated,emdr,5)
bg(traumatised,patients,5)
bg(movement,desensitization,4)
bg(shapiro,b,4)
bg(aip,model,4)
bg(press,shapiro,4)
bg(emdr,eye,3)
bg(desensitization,reprocessing,3)
bg(treatment,sessions,3)
bg(controlled,studies,3)
tg(2979)
tg(amp,silk,forrest,7)
tg(lazrove,amp,press,6)
tg(shapiro,amp,silk,6)
tg(eye,movement,desensitization,4)
tg(emdr,eye,movement,3)
tg(movement,desensitization,reprocessing,3)
tg(brown,mcgoldrick,amp,3)
tg(mcgoldrick,amp,buchanan,3)
tg(post,traumatic,stress,3)
tg(traumatic,stress,disorder,3)
tg(accelerated,information,processing,3)
tg(sets,eye,movements,3)
tg(van,der,kolk,3)
tg(shapiro,shapiro,amp,3)
tg(information,processing,system,3)
tg(paulsen,lazrove,amp,3)
tg(per,cent,per,3)
tg(cent,per,cent,3)
tg(positive,effects,emdr,2)
tg(shapiro,amp,solomon,2)
tg(wolpe,amp,abrams,2)
tg(solomon,gerrity,amp,2)
tg(gerrity,amp,muff,2)
tg(carlson,chemtob,rusnak,2)
tg(treatment,traumatic,memories,2)
tg(loss,loved,one,2)
tg(sine,amp,sine,2)
tg(information,processing,model,2)
tg(guide,clinical,practice,2)
tg(aip,model,developed,2)
tg(client,history,treatment,2)
tg(history,treatment,planning,2)
tg(criteria,post,traumatic,2)
tg(cognition,feels,completely,2)
tg(eye,movements,initiated,2)
tg(emotions,physical,sensations,2)
tg(positive,cognition,feels,2)
tg(dissonance,regarding,positive,2)
tg(regarding,positive,cognition,2)
tg(phase,emdr,treatment,2)
tg(areas,need,treatment,2)
[w123,emdr,treatment,emdr,79,7,1,553,26463589020]
[w541,patients,treated,emdr,2,5,79,790,26463589020]
[w541,outpatients,treated,emdr,1,5,79,395,26463589020]
[w541,successfully,treated,emdr,1,5,79,395,26463589020]
[w541,victims,treated,emdr,1,5,79,395,26463589020]
[w541,first,phase,emdr,1,3,79,237,26463589020]
[w541,final,phase,emdr,1,3,79,237,26463589020]
[w541,desensitization,phase,emdr,1,3,79,237,26463589020]
[w541,positive,effects,emdr,2,2,79,316,26463589020]
[w541,controlled,studies,emdr,1,2,79,158,26463589020]
[w123,emdr,treatment,therapist,79,7,1,553,26463589020]
[w123,emdr,treatment,assessment,79,7,1,553,26463589020
[w541,patients,treated,emdr,2,5,79,790,26463589020
| ?- ug(eye,F0),bg(eye,W2,F1),bg(W2,W3,F2),bg(W3,W4,F3).
F0 = 14,
F1 = 8,
F2 = 2,
F3 = 2,
W1 = eye,
W2 = movements,
W3 = initiated,
W4 = client ? ;
F0 = 14,
F1 = 8,
F2 = 2,
F3 = 1,
W1 = eye,
W2 = movements,
W3 = initiated,
W4 = process ? ;
F0 = 14,
F1 = 8,
F2 = 1,
F3 = 1,
W1 = eye,
W2 = movements,
W3 = studying,
W4 = process ? ;
F0 = 14,
F1 = 8,
F2 = 1,
F3 = 1,
W1 = eye,
W2 = movements,
W3 = initially,
W4 = considered ? ;
F0 = 14,
F1 = 8,
F2 = 1,
F3 = 1,
W1 = eye,
W2 = movements,
W3 = initially,
W4 = emdr ?
yes
|
|
European Journal of Clinical Hypnosis
|
|
|
|
Copyright©2002 European Journal of Clinical Hypnosis |
|?-
removehtmltokens([s(<),w([t,i,t,l,e]),s(>),w([s,o,m,e]),w([s,t,r,i,n,g]),s(<),s(/),w([t,i,t,l,e]),s(>)],X).
1
1 Call:
removehtmltokens([s(<),w([t,i,t,l,e]),s(>),w([s,o,m,e]),w([s,t,r,i|...]),s(<),s(/),w([...|...]),s(...)],_946)
?
2 2 Call:
removehtmltokens([w([s,o,m,e]),w([s,t,r,i,n,g]),s(<),s(/),w([t,i,t,l|...]),s(>)],_946)
?
3
3 Call:
removehtmltokens([w([s,t,r,i,n,g]),s(<),s(/),w([t,i,t,l,e]),s(>)],_2457)
?
4
4 Call: removehtmltokens([s(<),s(/),w([t,i,t,l,e]),s(>)],_3057) ?
5
5 Call: removehtmltokens([ ],_3057) ?
5
5 Exit: removehtmltokens([ ],[ ]) ?
4
4 Exit: removehtmltokens([s(<),s(/),w([t,i,t,l,e]),s(>)],[ ]) ?
3
3 Exit: removehtmltokens([w([s,t,r,i,n,g]),s(<),s(/),w([t,i,t,l,e]),s(>)],[w([s,t,r,i,n,g])])
?
2
2 Exit:
removehtmltokens([w([s,o,m,e]),w([s,t,r,i,n,g]),s(<),s(/),w([t,i,t,l|...]),s(>)],[w([s,o,m,e]),w([s,t,r,i,n,g])])
?
1
1 Exit: removehtmltokens([s(<),w([t,i,t,l,e]),s(>),w([s,o,m,e]),w([s,t,r,i|...]),s(<),s(/),w([...|...]),s(...)],[w([s,o,m,e]),w([s,t,r,i,n,g])])
?
X =
[w([s,o,m,e]),w([s,t,r,i,n,g])] ?
yes
| ?-
cleardata.
1
1 Call: cleardata ?
2
2 Call: retract(user:ug(_1033)) ?
2
2 Exit: retract(user:ug(6)) ?
3
2 Call: assertz(user:ug(0)) ?
3
2 Exit: assertz(user:ug(0)) ?
4
2 Call: retractall(user:ug(_1012,_1013)) ?
4
2 Exit: retractall(user:ug(_1012,_1013)) ?
5
2 Call: retract(user:bg(_1002)) ?
5
2 Exit: retract(user:bg(3)) ?
6
2 Call: assertz(user:bg(0)) ?
6
2 Exit: assertz(user:bg(0)) ?
7
2 Call: retractall(user:bg(_980,_981,_982)) ?
7
2 Exit: retractall(user:bg(_980,_981,_982)) ?
8
2 Call: retract(user:tg(_970)) ?
8
2 Exit: retract(user:tg(0)) ?
9
2 Call: assertz(user:tg(0)) ?
9
2 Exit: assertz(user:tg(0)) ?
10
2 Call: retractall(user:tg(_950,_951,_952,_953)) ?
10
2 Exit: retractall(user:tg(_950,_951,_952,_953)) ?
1
1 Exit: cleardata ?
yes
| ?-
flattenlist([[l,i,s,t],[o,f],[l,i,s,t]],X).
1
1 Call: flattenlist([[l,i,s,t],[o,f],[l,i,s,t]],_603) ?
2
2 Call: flattenlist([[l,i,s,t,o,f],[l,i,s,t]],_603) ?
3
3 Call: flattenlist([[l,i,s,t,o,f,l,i|...]],_603) ?
? 3
3 Exit: flattenlist([[l,i,s,t,o,f,l,i|...]],[l,i,s,t,o,f,l,i,s|...]) ?
? 2
2 Exit: flattenlist([[l,i,s,t,o,f],[l,i,s,t]],[l,i,s,t,o,f,l,i,s|...]) ?
? 1
1 Exit: flattenlist([[l,i,s,t],[o,f],[l,i,s,t]],[l,i,s,t,o,f,l,i,s|...])
?
X =
[l,i,s,t,o,f,l,i,s,t] ?
yes
three unigram words
Number of tokens: 3
[w([t,h,r,e,e]),w([u,n,i,g,r,a,m]),w([w,o,r,d,s])]
Number of words: 3
[three,unigram,words]
Number of words for uni-,bi-,trigrams: 3
[three,unigram,words]
Number of unigrams found: 3
Unsorted list:
[1-ug(words,1),1-ug(unigram,1),1-ug(three,1)]
Sorted list:
[1-ug(words,1),1-ug(unigram,1),1-ug(three,1)]
Number of bigrams found: 2
Unsorted list:
[1-bg(unigram,words,1),1-bg(three,unigram,1)]
Sorted list: [1-bg(unigram,words,1),1-bg(three,unigram,1)]
Number of trigrams found: 1
Unsorted list: [1-tg(three,unigram,words,1)]
Sorted list: [1-tg(three,unigram,words,1)]
ug(3)
ug(three,1)
ug(unigram,1)
ug(words,1)
bg(2)
bg(three,unigram,1)
bg(unigram,words,1)
tg(1)
tg(three,unigram,words,1)
[w123,three,unigram,words,1,1,1,1,6]
| ?-
run(no,no,no,'unigram_test.txt','unigram_test_out.txt').
1 1 Call:
run(no,no,no,'unigram_test.txt','unigram_test_out.txt') ?
2 2 Call:
tokenize_file('unigram_test.txt',_1222) ?
3 3 Call:
open('unigram_test.txt',read,_2158) ?
3 3 Exit:
open('unigram_test.txt',read,'$stream'(270092928)) ?
4 3 Call:
tokenize_stream('$stream'(270092928),_1222) ?
5 4 Call:
at_end_of_stream('$stream'(270092928)) ?
5 4 Fail:
at_end_of_stream('$stream'(270092928)) ?
6 4 Call:
tokenize_line_dl('$stream'(270092928),_1222/_3920) ?
7 5 Call:
at_end_of_stream('$stream'(270092928)) ?
7 5 Fail:
at_end_of_stream('$stream'(270092928)) ?
8 5 Call:
get_char_and_type('$stream'(270092928),_4511,_4512) ?
9 6 Call:
get_char('$stream'(270092928),_5096) ?
9 6 Exit: get_char('$stream'(270092928),t)
?
10 6 Call:
char_type_char(t,_4512,_4511) ?
11 7 Call:
char_table(t,_4512,_4511) ?
11 7 Exit:
char_table(t,letter,t) ?
10 6 Exit:
char_type_char(t,letter,t) ?
8 5 Exit:
get_char_and_type('$stream'(270092928),t,letter) ?
12 5 Call:
tokenize_line_x(letter,t,'$stream'(270092928),_1222/_3920) ?
13 6 Call:
tokenize_letters(letter,t,'$stream'(270092928),_9591,_9582,_9583) ?
14 7 Call: get_char_and_type('$stream'(270092928),_10254,_10255)
?
15 8 Call:
get_char('$stream'(270092928),_10885) ?
15 8 Exit:
get_char('$stream'(270092928),h) ?
16 8 Call:
char_type_char(h,_10255,_10254) ?
17 9 Call:
char_table(h,_10255,_10254) ?
17 9 Exit:
char_table(h,letter,h) ?
16 8 Exit:
char_type_char(h,letter,h) ?
14 7 Exit:
get_char_and_type('$stream'(270092928),h,letter) ?
18 7 Call:
tokenize_letters(letter,h,'$stream'(270092928),_10263,_9582,_9583) ?
19 8 Call:
get_char_and_type('$stream'(270092928),_15433,_15434) ?
20 9 Call:
get_char('$stream'(270092928),_16064) ?
20 9 Exit:
get_char('$stream'(270092928),r) ?
21
9 Call: char_type_char(r,_15434,_15433) ?
22 10 Call:
char_table(r,_15434,_15433) ?
22 10 Exit:
char_table(r,letter,r) ?
21 9 Exit:
char_type_char(r,letter,r) ?
19 8 Exit:
get_char_and_type('$stream'(270092928),r,letter) ?
23 8 Call:
tokenize_letters(letter,r,'$stream'(270092928),_15442,_9582,_9583) ?
24 9 Call:
get_char_and_type('$stream'(270092928),_20612,_20613) ?
25 10 Call:
get_char('$stream'(270092928),_21243) ?
25 10 Exit:
get_char('$stream'(270092928),e) ?
26 10 Call:
char_type_char(e,_20613,_20612) ?
27 11 Call:
char_table(e,_20613,_20612) ?
27 11 Exit:
char_table(e,letter,e) ?
26 10 Exit: char_type_char(e,letter,e)
?
24 9 Exit:
get_char_and_type('$stream'(270092928),e,letter) ?
28 9 Call:
tokenize_letters(letter,e,'$stream'(270092928),_20621,_9582,_9583) ?
29 10 Call:
get_char_and_type('$stream'(270092928),_25791,_25792) ?
30 11 Call:
get_char('$stream'(270092928),_26422) ?
30 11 Exit:
get_char('$stream'(270092928),e) ?
31 11 Call:
char_type_char(e,_25792,_25791) ?
32 12 Call:
char_table(e,_25792,_25791) ?
32
12 Exit: char_table(e,letter,e) ?
31 11 Exit:
char_type_char(e,letter,e) ?
29 10 Exit:
get_char_and_type('$stream'(270092928),e,letter) ?
33 10 Call:
tokenize_letters(letter,e,'$stream'(270092928),_25800,_9582,_9583) ?
34 11 Call:
get_char_and_type('$stream'(270092928),_30970,_30971) ?
35 12 Call:
get_char('$stream'(270092928),_31601)
35 12 Exit:
get_char('$stream'(270092928),' ') ?
36 12 Call: char_type_char('
',_30971,_30970) ?
37 13 Call: char_table('
',_30971,_30970) ?
37 13 Exit: char_table('
',whitespace,' ') ?
36 12 Exit: char_type_char('
',whitespace,' ') ?
34 11 Exit:
get_char_and_type('$stream'(270092928),' ',whitespace) ?
38 11 Call:
tokenize_letters(whitespace,' ','$stream'(270092928),_30979,_9582,_9583) ?
38 11 Exit:
tokenize_letters(whitespace,' ','$stream'(270092928),[],whitespace,' ') ?
33 10 Exit:
tokenize_letters(letter,e,'$stream'(270092928),[e],whitespace,' ') ?
28 9 Exit:
tokenize_letters(letter,e,'$stream'(270092928),[e,e],whitespace,' ') ?
23 8 Exit:
tokenize_letters(letter,r,'$stream'(270092928),[r,e,e],whitespace,' ') ?
18 7 Exit:
tokenize_letters(letter,h,'$stream'(270092928),[h,r,e,e],whitespace,' ') ?
13 6 Exit:
tokenize_letters(letter,t,'$stream'(270092928),[t,h,r,e,e],whitespace,' ') ?
39 6 Call:
tokenize_line_x(whitespace,' ','$stream'(270092928),_9593/_3920) ?
40 7 Call:
tokenize_line_dl('$stream'(270092928),_9593/_3920) ?
41 8 Call:
at_end_of_stream('$stream'(270092928)) ?
41 8 Fail:
at_end_of_stream('$stream'(270092928)) ?
42 8 Call: get_char_and_type('$stream'(270092928),_40926,_40927)
?
43 9 Call:
get_char('$stream'(270092928),_41511) ?
43 9 Exit:
get_char('$stream'(270092928),u) ?
44 9 Call:
char_type_char(u,_40927,_40926) ?
45 10 Call: char_table(u,_40927,_40926)
?
45 10 Exit:
char_table(u,letter,u) ?
44 9 Exit:
char_type_char(u,letter,u) ?
42 8 Exit:
get_char_and_type('$stream'(270092928),u,letter) ?
46 8 Call:
tokenize_line_x(letter,u,'$stream'(270092928),_9593/_3920) ?
47 9 Call:
tokenize_letters(letter,u,'$stream'(270092928),_46006,_45997,_45998) ?
48 10 Call:
get_char_and_type('$stream'(270092928),_46669,_46670) ?
49 11 Call:
get_char('$stream'(270092928),_47300) ?
49 11 Exit:
get_char('$stream'(270092928),n) ?
50 11 Call:
char_type_char(n,_46670,_46669) ?
51 12 Call:
char_table(n,_46670,_46669) ?
51 12 Exit:
char_table(n,letter,n) ?
50 11 Exit:
char_type_char(n,letter,n) ?
48 10 Exit:
get_char_and_type('$stream'(270092928),n,letter) ?
52 10 Call:
tokenize_letters(letter,n,'$stream'(270092928),_46678,_45997,_45998) ?
53 11 Call:
get_char_and_type('$stream'(270092928),_51848,_51849) ?
54 12 Call:
get_char('$stream'(270092928),_52479) ?
54 12 Exit:
get_char('$stream'(270092928),i) ?
55 12 Call:
char_type_char(i,_51849,_51848) ?
56 13 Call:
char_table(i,_51849,_51848) ?
56 13 Exit:
char_table(i,letter,i) ?
55 12 Exit:
char_type_char(i,letter,i) ?
53 11 Exit:
get_char_and_type('$stream'(270092928),i,letter) ?
57 11 Call:
tokenize_letters(letter,i,'$stream'(270092928),_51857,_45997,_45998) ?
58 12 Call:
get_char_and_type('$stream'(270092928),_57027,_57028) ?
59 13 Call:
get_char('$stream'(270092928),_57658) ?
59 13 Exit:
get_char('$stream'(270092928),g) ?
60 13 Call: char_type_char(g,_57028,_57027)
?
61 14 Call:
char_table(g,_57028,_57027) ?
61 14 Exit:
char_table(g,letter,g) ?
60 13 Exit:
char_type_char(g,letter,g) ?
58 12 Exit:
get_char_and_type('$stream'(270092928),g,letter) ?
62 12 Call:
tokenize_letters(letter,g,'$stream'(270092928),_57036,_45997,_45998) ?
63 13 Call:
get_char_and_type('$stream'(270092928),_62206,_62207) ?
64 14 Call:
get_char('$stream'(270092928),_62837) ?
64 14 Exit: get_char('$stream'(270092928),r) ?
65 14 Call:
char_type_char(r,_62207,_62206) ?
66 15 Call:
char_table(r,_62207,_62206) ?
66 15 Exit:
char_table(r,letter,r) ?
65 14 Exit:
char_type_char(r,letter,r) ?
63 13 Exit:
get_char_and_type('$stream'(270092928),r,letter) ?
67 13 Call:
tokenize_letters(letter,r,'$stream'(270092928),_62215,_45997,_45998) ?
68 14 Call:
get_char_and_type('$stream'(270092928),_67385,_67386) ?
69
15 Call: get_char('$stream'(270092928),_68016) ?
69 15 Exit:
get_char('$stream'(270092928),a) ?
70 15 Call:
char_type_char(a,_67386,_67385) ?
71 16 Call:
char_table(a,_67386,_67385) ?
71 16 Exit:
char_table(a,letter,a) ?
70 15 Exit:
char_type_char(a,letter,a) ?
68 14 Exit:
get_char_and_type('$stream'(270092928),a,letter) ?
72 14 Call:
tokenize_letters(letter,a,'$stream'(270092928),_67394,_45997,_45998) ?
73
15 Call: get_char_and_type('$stream'(270092928),_72564,_72565) ?
74 16 Call:
get_char('$stream'(270092928),_73195) ?
74 16 Exit:
get_char('$stream'(270092928),m) ?
75 16 Call:
char_type_char(m,_72565,_72564) ?
76
17 Call: char_table(m,_72565,_72564) ?
76 17 Exit:
char_table(m,letter,m) ?
75 16 Exit:
char_type_char(m,letter,m) ?
73 15 Exit:
get_char_and_type('$stream'(270092928),m,letter) ?
77 15 Call: tokenize_letters(letter,m,'$stream'(270092928),_72573,_45997,_45998)
?
78 16 Call:
get_char_and_type('$stream'(270092928),_77743,_77744) ?
79 17 Call:
get_char('$stream'(270092928),_78374) ?
79 17 Exit:
get_char('$stream'(270092928),' ') ?
80 17 Call: char_type_char('
',_77744,_77743) ?
81 18 Call: char_table('
',_77744,_77743) ?
81 18 Exit: char_table('
',whitespace,' ') ?
80 17 Exit: char_type_char('
',whitespace,' ') ?
78 16 Exit:
get_char_and_type('$stream'(270092928),' ',whitespace) ?
82 16 Call:
tokenize_letters(whitespace,' ','$stream'(270092928),_77752,_45997,_45998) ?
82 16 Exit:
tokenize_letters(whitespace,' ','$stream'(270092928),[],whitespace,' ') ?
77 15 Exit:
tokenize_letters(letter,m,'$stream'(270092928),[m],whitespace,' ') ?
72 14 Exit:
tokenize_letters(letter,a,'$stream'(270092928),[a,m],whitespace,' ') ?
67 13 Exit:
tokenize_letters(letter,r,'$stream'(270092928),[r,a,m],whitespace,' ') ?
62 12 Exit:
tokenize_letters(letter,g,'$stream'(270092928),[g,r,a,m],whitespace,' ') ?
57 11 Exit:
tokenize_letters(letter,i,'$stream'(270092928),[i,g,r,a,m],whitespace,' ') ?
52 10 Exit:
tokenize_letters(letter,n,'$stream'(270092928),[n,i,g,r,a,m],whitespace,' ') ?
47 9 Exit:
tokenize_letters(letter,u,'$stream'(270092928),[u,n,i,g,r,a,m],whitespace,' ')
?
83 9 Call:
tokenize_line_x(whitespace,' ','$stream'(270092928),_46008/_3920) ?
84 10 Call:
tokenize_line_dl('$stream'(270092928),_46008/_3920) ?
85 11 Call:
at_end_of_stream('$stream'(270092928)) ?
85 11 Fail:
at_end_of_stream('$stream'(270092928)) ?
86 11 Call:
get_char_and_type('$stream'(270092928),_88865,_88866) ?
87 12 Call:
get_char('$stream'(270092928),_89450) ?
87 12 Exit:
get_char('$stream'(270092928),w) ?
88 12 Call:
char_type_char(w,_88866,_88865) ?
89 13 Call: char_table(w,_88866,_88865) ?
89 13 Exit:
char_table(w,letter,w) ?
88 12 Exit:
char_type_char(w,letter,w) ?
86 11 Exit:
get_char_and_type('$stream'(270092928),w,letter) ?
90 11 Call: tokenize_line_x(letter,w,'$stream'(270092928),_46008/_3920)
?
91 12 Call:
tokenize_letters(letter,w,'$stream'(270092928),_93945,_93936,_93937) ?
92 13 Call:
get_char_and_type('$stream'(270092928),_94608,_94609) ?
93 14 Call: get_char('$stream'(270092928),_95239)
?
93 14 Exit:
get_char('$stream'(270092928),o) ?
94 14 Call:
char_type_char(o,_94609,_94608) ?
95 15 Call:
char_table(o,_94609,_94608) ?
95 15 Exit:
char_table(o,letter,o) ?
94
14 Exit: char_type_char(o,letter,o) ?
92 13 Exit:
get_char_and_type('$stream'(270092928),o,letter) ?
96 13 Call:
tokenize_letters(letter,o,'$stream'(270092928),_94617,_93936,_93937) ?
97 14 Call: get_char_and_type('$stream'(270092928),_99787,_99788)
?
98 15 Call:
get_char('$stream'(270092928),_100418) ?
98 15 Exit:
get_char('$stream'(270092928),r) ?
99 15 Call:
char_type_char(r,_99788,_99787) ?
100 16 Call: char_table(r,_99788,_99787)
?
100 16 Exit:
char_table(r,letter,r) ?
99 15 Exit:
char_type_char(r,letter,r) ?
97 14 Exit:
get_char_and_type('$stream'(270092928),r,letter) ?
101 14 Call:
tokenize_letters(letter,r,'$stream'(270092928),_99796,_93936,_93937) ?
102 15 Call:
get_char_and_type('$stream'(270092928),_104966,_104967) ?
103 16 Call:
get_char('$stream'(270092928),_105597) ?
103 16 Exit:
get_char('$stream'(270092928),d) ?
104 16 Call:
char_type_char(d,_104967,_104966) ?
105 17 Call:
char_table(d,_104967,_104966) ?
105 17 Exit:
char_table(d,letter,d) ?
104 16 Exit:
char_type_char(d,letter,d) ?
102 15 Exit:
get_char_and_type('$stream'(270092928),d,letter) ?
106 15 Call:
tokenize_letters(letter,d,'$stream'(270092928),_104975,_93936,_93937) ?
107 16 Call:
get_char_and_type('$stream'(270092928),_110145,_110146) ?
108 17 Call:
get_char('$stream'(270092928),_110776) ?
108 17 Exit:
get_char('$stream'(270092928),s) ?
109 17 Call:
char_type_char(s,_110146,_110145) ?
110 18 Call:
char_table(s,_110146,_110145) ?
110 18 Exit:
char_table(s,letter,s) ?
109 17 Exit:
char_type_char(s,letter,s) ?
107 16 Exit:
get_char_and_type('$stream'(270092928),s,letter) ?
111 16 Call:
tokenize_letters(letter,s,'$stream'(270092928),_110154,_93936,_93937) ?
112 17 Call:
get_char_and_type('$stream'(270092928),_115324,_115325) ?
113 18 Call:
get_char('$stream'(270092928),_115955) ?
113 18 Exit:
get_char('$stream'(270092928),end_of_file) ?
114 18 Call:
char_type_char(end_of_file,_115325,_115324) ?
115 19 Call:
char_table(end_of_file,_115325,_115324) ?
115 19 Exit:
char_table(end_of_file,eol,end_of_file) ?
114 18 Exit:
char_type_char(end_of_file,eol,end_of_file) ?
112 17 Exit:
get_char_and_type('$stream'(270092928),end_of_file,eol) ?
116 17 Call:
tokenize_letters(eol,end_of_file,'$stream'(270092928),_115333,_93936,_93937) ?
116 17 Exit:
tokenize_letters(eol,end_of_file,'$stream'(270092928),[],eol,end_of_file) ?
111 16 Exit:
tokenize_letters(letter,s,'$stream'(270092928),[s],eol,end_of_file) ?
106 15 Exit:
tokenize_letters(letter,d,'$stream'(270092928),[d,s],eol,end_of_file) ?
101 14 Exit:
tokenize_letters(letter,r,'$stream'(270092928),[r,d,s],eol,end_of_file) ?
96 13 Exit:
tokenize_letters(letter,o,'$stream'(270092928),[o,r,d,s],eol,end_of_file) ?
91 12 Exit:
tokenize_letters(letter,w,'$stream'(270092928),[w,o,r,d,s],eol,end_of_file) ?
117 12 Call:
tokenize_line_x(eol,end_of_file,'$stream'(270092928),_93947/_3920) ?
117 12 Exit:
tokenize_line_x(eol,end_of_file,'$stream'(270092928),_3920/_3920) ?
90 11 Exit:
tokenize_line_x(letter,w,'$stream'(270092928),[w([w,o,r,d,s])|_3920]/_3920) ?
84 10 Exit: tokenize_line_dl('$stream'(270092928),[w([w,o,r,d,s])|_3920]/_3920)
?
83 9 Exit:
tokenize_line_x(whitespace,'
','$stream'(270092928),[w([w,o,r,d,s])|_3920]/_3920) ?
46 8 Exit:
tokenize_line_x(letter,u,'$stream'(270092928),[w([u,n,i,g,r,a|...]),w([w,o,r,d,s])|_3920]/_3920)
?
40 7 Exit:
tokenize_line_dl('$stream'(270092928),[w([u,n,i,g,r,a|...]),w([w,o,r,d,s])|_3920]/_3920)
?
39 6 Exit:
tokenize_line_x(whitespace,'
','$stream'(270092928),[w([u,n,i,g,r,a|...]),w([w,o,r,d,s])|_3920]/_3920) ?
12 5 Exit:
tokenize_line_x(letter,t,'$stream'(270092928),[w([t,h,r,e,e]),w([u,n,i,g,r,a|...]),w([w,o,r,d,s])|_3920]/_3920)
? 6 4 Exit:
tokenize_line_dl('$stream'(270092928),[w([t,h,r,e,e]),w([u,n,i,g,r,a|...]),w([w,o,r,d,s])|_3920]/_3920)
?
118 4 Call:
tokenize_stream('$stream'(270092928),_3920) ?
119 5 Call:
at_end_of_stream('$stream'(270092928)) ?
119 5 Exit:
at_end_of_stream('$stream'(270092928)) ?
118 4 Exit: tokenize_stream('$stream'(270092928),[])
?
4 3 Exit:
tokenize_stream('$stream'(270092928),[w([t,h,r,e,e]),w([u,n,i,g,r,a,m]),w([w,o,r,d,s])])
?
120 3 Call:
close('$stream'(270092928)) ?
120 3 Exit:
close('$stream'(270092928)) ?
2 2 Exit:
tokenize_file('unigram_test.txt',[w([t,h,r,e,e]),w([u,n,i,g,r,a,m]),w([w,o,r,d,s])])
?
121 2 Call:
tell('unigram_test_out.txt') ?
121 2 Exit:
tell('unigram_test_out.txt') ?
122 2 Call: write('Number of
tokens: ') ?
122 2 Exit: write('Number of
tokens: ') ?
123 2 Call:
length([w([t,h,r,e,e]),w([u,n,i,g,r,a,m]),w([w,o,r,d,s])],_1206) ?
123 2 Exit:
length([w([t,h,r,e,e]),w([u,n,i,g,r,a,m]),w([w,o,r,d,s])],3) ?
124 2 Call: write(3) ?
124 2 Exit: write(3) ?
125 2 Call: nl ?
125 2 Exit: nl ?
126 2 Call:
write([w([t,h,r,e,e]),w([u,n,i,g,r,a,m]),w([w,o,r,d,s])]) ?
126 2 Exit:
write([w([t,h,r,e,e]),w([u,n,i,g,r,a,m]),w([w,o,r,d,s])]) ?
127 2 Call: nl ?
127 2 Exit: nl ?
128 2 Call: nl ?
128 2 Exit: nl ?
129 2 Call: no==yes ?
129 2 Fail: no==yes ?
130 2 Call: _1172=[w([t,h,r,e,e]),w([u,n,i,g,r,a,m]),w([w,o,r,d,s])]
?
130 2 Exit:
[w([t,h,r,e,e]),w([u,n,i,g,r,a,m]),w([w,o,r,d,s])]=[w([t,h,r,e,e]),w([u,n,i,g,r,a,m]),w([w,o,r,d,s])]
?
131 2 Call:
tokens_words([w([t,h,r,e,e]),w([u,n,i,g,r,a,m]),w([w,o,r,d,s])],_1163) ?
132 3 Call:
atom_chars(_152142,[t,h,r,e,e]) ?
132 3 Exit:
atom_chars(three,[t,h,r,e,e]) ?
133 3 Call:
tokens_words([w([u,n,i,g,r,a,m]),w([w,o,r,d,s])],_152143) ?
134 4 Call: atom_chars(_153914,[u,n,i,g,r,a,m])
?
134 4 Exit:
atom_chars(unigram,[u,n,i,g,r,a,m]) ? 135
4 Call: tokens_words([w([w,o,r,d,s])],_153915) ?
136 5 Call:
atom_chars(_155686,[w,o,r,d,s]) ?
136 5 Exit:
atom_chars(words,[w,o,r,d,s]) ?
137 5 Call:
tokens_words([],_155687) ?
137 5 Exit:
tokens_words([],[]) ?
135 4 Exit:
tokens_words([w([w,o,r,d,s])],[words]) ?
133 3 Exit:
tokens_words([w([u,n,i,g,r,a,m]),w([w,o,r,d,s])],[unigram,words]) ?
131 2 Exit:
tokens_words([w([t,h,r,e,e]),w([u,n,i,g,r,a,m]),w([w,o,r,d,s])],[three,unigram,words])
?
138 2 Call: write('Number of
words: ') ?
138 2 Exit: write('Number of
words: ') ?
139 2 Call: length([three,unigram,words],_1152)
?
139 2 Exit:
length([three,unigram,words],3) ?
140 2 Call: write(3) ?
140 2 Exit: write(3) ?
141 2 Call: nl ?
141 2 Exit: nl ?
142 2 Call:
write([three,unigram,words]) ?
142 2 Exit:
write([three,unigram,words]) ?
143 2 Call: nl ?
143 2 Exit: nl ?
144 2 Call: nl ?
144 2 Exit: nl ?
145 2 Call: no==yes ?
145 2 Fail: no==yes ?
146 2 Call: _1118=[three,unigram,words] ?
146 2 Exit:
[three,unigram,words]=[three,unigram,words] ?
147 2 Call: no==yes ?
147 2 Fail: no==yes ?
148 2 Call:
_1100=[three,unigram,words] ?
148 2 Exit:
[three,unigram,words]=[three,unigram,words] ?
149 2 Call: write('Number of
words for uni-,bi-,trigrams: ') ?
149 2 Exit: write('Number of
words for uni-,bi-,trigrams: ') ?
150 2 Call:
length([three,unigram,words],_1086) ?
150 2 Exit:
length([three,unigram,words],3) ?
151 2 Call: write(3) ?
151 2 Exit: write(3) ?
152 2 Call: nl ?
152 2 Exit: nl ?
153 2 Call:
write([three,unigram,words]) ?
153 2 Exit: write([three,unigram,words]) ?
154 2 Call: nl ?
154 2 Exit: nl ?
155 2 Call: nl ?
155 2 Exit: nl ?
156 2 Call:
unigram([three,unigram,words]) ?
157 3 Call: ug(_191167) ?
157 3 Exit: ug(0) ?
158 3 Call:
unigram([three,unigram,words],0) ?
159 4 Call:
existsug(three,_192853) ?
160 5 Call:
clause(user:ug(three,_192853),true) ?
160 5 Fail:
clause(user:ug(three,_192853),true) ?
159 4 Exit:
existsug(three,0) ?
161 4 Call: _192846 is 0+1 ?
161 4 Exit: 1 is 0+1 ?
162 4 Call: _192837 is 0+1 ?
162 4 Exit: 1 is 0+1 ?
163 4 Call:
asserta(user:ug(three,1)) ?
163 4 Exit:
asserta(user:ug(three,1)) ?
164 4 Call:
unigram([unigram,words],1) ?
165 5 Call:
existsug(unigram,_198548) ?
166 6 Call:
clause(user:ug(unigram,_198548),true) ?
166 6 Fail: clause(user:ug(unigram,_198548),true)
?
165 5 Exit:
existsug(unigram,0) ?
167 5 Call: _198541 is 0+1 ?
167 5 Exit: 1 is 0+1 ?
168 5 Call: _198532 is 1+1 ?
168 5 Exit: 2 is 1+1 ?
169 5 Call: asserta(user:ug(unigram,1))
?
169 5 Exit:
asserta(user:ug(unigram,1)) ?
170 5 Call:
unigram([words],2) ?
171 6 Call:
existsug(words,_204243) ?
172 7 Call:
clause(user:ug(words,_204243),true) ?
172 7 Fail:
clause(user:ug(words,_204243),true) ?
171 6 Exit:
existsug(words,0)
173 6 Call: _204236 is 0+1 ?
173 6 Exit: 1 is 0+1 ?
174 6 Call: _204227 is 2+1 ?
174 6 Exit: 3 is 2+1 ?
175 6 Call:
asserta(user:ug(words,1)) ?
175 6 Exit:
asserta(user:ug(words,1)) ?
176 6 Call: unigram([],3) ?
177 7 Call:
retract(user:ug(_209995)) ?
177 7 Exit:
retract(user:ug(0)) ?
178 7 Call: assertz(user:ug(3))
?
178 7 Exit:
assertz(user:ug(3)) ?
179 7 Call: write('Number of
unigrams found: ') ?
179 7 Exit: write('Number of
unigrams found: ') ?
180 7 Call: write(3) ?
180 7 Exit: write(3) ?
181
7 Call: nl ?
181 7 Exit: nl ?
182 7 Call:
findall(_209962-ug(_209959,_209962),user:ug(_209959,_209962),_209967) ?
183 8 Call:
ug(_209959,_209962) ?
?
183 8 Exit: ug(words,1) ?
183 8 Redo: ug(words,1) ?
?
183 8 Exit: ug(unigram,1) ?
183 8 Redo: ug(unigram,1) ?
183 8 Exit: ug(three,1) ?
182 7 Exit:
findall(_209962-ug(_209959,_209962),user:ug(_209959,_209962),[1-ug(words,1),1-ug(unigram,1),1-ug(three,1)])
?
184 7 Call: write('Unsorted
list: ') ?
184 7 Exit: write('Unsorted
list: ') ?
185 7 Call:
write([1-ug(words,1),1-ug(unigram,1),1-ug(three,1)]) ?
185 7 Exit:
write([1-ug(words,1),1-ug(unigram,1),1-ug(three,1)]) ?
186 7 Call: nl ?
186 7 Exit: nl ?
187 7 Call: nl ?
187 7 Exit: nl ?
188 7 Call:
keysort([1-ug(words,1),1-ug(unigram,1),1-ug(three,1)],_209932) ?
188 7 Exit: keysort([1-ug(words,1),1-ug(unigram,1),1-ug(three,1)],[1-ug(words,1),1-ug(unigram,1),1-ug(three,1)])
?
189 7 Call: write('Sorted
list: ') ?
189 7 Exit: write('Sorted
list: ') ?
190 7 Call:
write([1-ug(words,1),1-ug(unigram,1),1-ug(three,1)]) ?
190 7 Exit:
write([1-ug(words,1),1-ug(unigram,1),1-ug(three,1)]) ?
191 7 Call: nl ?
191 7 Exit: nl ?
192 7 Call: nl ?
192 7 Exit: nl ?
193 7 Call:
retractall(user:ug(_209904,_209905)) ?
193 7 Exit:
retractall(user:ug(_209904,_209905)) ?
194 7 Call:
restoreterms([1-ug(words,1),1-ug(unigram,1),1-ug(three,1)]) ?
195 8 Call:
asserta(user:ug(words,1)) ?
195 8 Exit:
asserta(user:ug(words,1)) ?
196 8 Call:
restoreterms([1-ug(unigram,1),1-ug(three,1)]) ?
197 9 Call:
asserta(user:ug(unigram,1)) ?
197 9 Exit:
asserta(user:ug(unigram,1)) ?
198 9 Call:
restoreterms([1-ug(three,1)]) ?
199 10 Call: asserta(user:ug(three,1)) ?
199 10 Exit:
asserta(user:ug(three,1)) ?
200 10 Call: restoreterms([])
?
200 10 Exit: restoreterms([])
?
198 9 Exit:
restoreterms([1-ug(three,1)]) ?
196 8 Exit:
restoreterms([1-ug(unigram,1),1-ug(three,1)]) ?
194 7 Exit:
restoreterms([1-ug(words,1),1-ug(unigram,1),1-ug(three,1)]) ?
176 6 Exit: unigram([],3) ?
170 5 Exit:
unigram([words],2) ?
164 4 Exit: unigram([unigram,words],1)
?
158 3 Exit:
unigram([three,unigram,words],0) ?
156 2 Exit:
unigram([three,unigram,words]) ?
201 2 Call:
bigram([three,unigram,words]) ?
202 3 Call: bg(_247307) ?
202 3 Exit: bg(0) ?
203 3 Call:
bigram([three,unigram,words],0) ?
204 4 Call:
existsbg(three,unigram,_249000) ?
205 5 Call:
clause(user:bg(three,unigram,_249000),true) ?
205 5 Fail:
clause(user:bg(three,unigram,_249000),true) ?
204 4 Exit:
existsbg(three,unigram,0) ?
206 4 Call: _248992 is 0+1 ?
206 4 Exit: 1 is 0+1 ?
207 4 Call: _248983 is 0+1 ?
207 4 Exit: 1 is 0+1 ?
208 4 Call:
asserta(user:bg(three,unigram,1)) ?
208 4 Exit:
asserta(user:bg(three,unigram,1)) ?
209 4 Call:
bigram([unigram,words],1) ?
210 5 Call:
existsbg(unigram,words,_254813) ?
211 6 Call:
clause(user:bg(unigram,words,_254813),true) ?
211
6 Fail: clause(user:bg(unigram,words,_254813),true) ?
210 5 Exit:
existsbg(unigram,words,0) ?
212 5 Call: _254805 is 0+1 ?
212 5 Exit: 1 is 0+1 ?
213 5 Call: _254796 is 1+1 ?
213 5 Exit: 2 is 1+1 ?
214 5 Call:
asserta(user:bg(unigram,words,1)) ?
214 5 Exit:
asserta(user:bg(unigram,words,1)) ?
215 5 Call:
bigram([words],2) ?
216 6 Call:
retract(user:bg(_260682)) ?
216 6 Exit: retract(user:bg(0))
?
217 6 Call:
assertz(user:bg(2)) ?
217 6 Exit:
assertz(user:bg(2)) ?
218 6 Call: write('Number of
bigrams found: ') ?
218 6 Exit: write('Number of
bigrams found: ') ?
219 6 Call: write(2) ?
219 6 Exit: write(2) ?
220 6 Call: nl ?
220 6 Exit: nl ?
221 6 Call:
findall(_260649-bg(_260645,_260646,_260649),user:bg(_260645,_260646,_260649),_260654)
?
222 7 Call: bg(_260645,_260646,_260649)
?
?
222 7 Exit:
bg(unigram,words,1) ?
222 7 Redo:
bg(unigram,words,1) ?
222 7 Exit:
bg(three,unigram,1) ?
221 6 Exit:
findall(_260649-bg(_260645,_260646,_260649),user:bg(_260645,_260646,_260649),[1-bg(unigram,words,1),1-bg(three,unigram,1)])
?
223 6 Call: write('Unsorted
list: ') ?
223 6 Exit: write('Unsorted
list: ') ?
224 6 Call:
write([1-bg(unigram,words,1),1-bg(three,unigram,1)]) ?
224 6 Exit: write([1-bg(unigram,words,1),1-bg(three,unigram,1)])
?
225 6 Call: nl ?
223 6 Exit: nl ?
226 6 Call: nl ?
226 6 Exit: nl ?
227 6 Call:
keysort([1-bg(unigram,words,1),1-bg(three,unigram,1)],_260617) ?
227 6 Exit:
keysort([1-bg(unigram,words,1),1-bg(three,unigram,1)],[1-bg(unigram,words,1),1-bg(three,unigram,1)])
?
228 6 Call: write('Sorted
list: ') ?
228 6 Exit: write('Sorted
list: ') ?
229 6 Call: write([1-bg(unigram,words,1),1-bg(three,unigram,1)])
?
229 6 Exit:
write([1-bg(unigram,words,1),1-bg(three,unigram,1)]) ?
230 6 Call: nl ?
230 6 Exit: nl ?
231 6 Call: nl ?
231 6 Exit: nl ?
232 6 Call:
retractall(user:bg(_260588,_260589,_260590)) ?
232 6 Exit:
retractall(user:bg(_260588,_260589,_260590)) ?
233 6 Call:
restoreterms([1-bg(unigram,words,1),1-bg(three,unigram,1)]) ?
234 7 Call:
asserta(user:bg(unigram,words,1)) ?
234 7 Exit:
asserta(user:bg(unigram,words,1)) ?
235 7 Call:
restoreterms([1-bg(three,unigram,1)]) ?
236 8 Call:
asserta(user:bg(three,unigram,1)) ?
236 8 Exit:
asserta(user:bg(three,unigram,1)) ?
237 8 Call: restoreterms([])
?
237 8 Exit: restoreterms([])
?
235 7 Exit:
restoreterms([1-bg(three,unigram,1)]) ?
233 6 Exit:
restoreterms([1-bg(unigram,words,1),1-bg(three,unigram,1)]) ?
?
215 5 Exit:
bigram([words],2) ?
?
209 4 Exit:
bigram([unigram,words],1) ?
?
203 3 Exit:
bigram([three,unigram,words],0) ?
201 2 Exit:
bigram([three,unigram,words]) ?
238 2 Call:
trigram([three,unigram,words]) ?
239 3 Call: tg(_295371) ?
239 3 Exit: tg(0) ?
240 3 Call:
trigram([three,unigram,words],0) ?
241 4 Call:
existstg(three,unigram,words,_297068) ?
242 5 Call:
clause(user:tg(three,unigram,words,_297068),true) ?
242 5 Fail:
clause(user:tg(three,unigram,words,_297068),true) ?
241 4 Exit:
existstg(three,unigram,words,0) ?
243 4 Call: _297059 is 0+1 ?
243 4 Exit: 1 is 0+1 ?
244 4 Call: _297050 is 0+1 ?
244
4 Exit: 1 is 0+1 ?
245 4 Call:
asserta(user:tg(three,unigram,words,1)) ?
245 4 Exit:
asserta(user:tg(three,unigram,words,1)) ?
246 4 Call:
trigram([unigram,words],1) ?
247 5 Call: retract(user:tg(_303051))
?
247 5 Exit:
retract(user:tg(0)) ?
248 5 Call:
assertz(user:tg(1)) ?
248 5 Exit:
assertz(user:tg(1)) ?
249 5 Call: write('Number of
trigrams found: ') ?
249 5 Exit: write('Number of
trigrams found: ') ?
250 5 Call: write(1) ?
250 5 Exit: write(1) ?
251 5 Call: nl ?
251 5 Exit: nl ?
252 5 Call:
findall(_303018-tg(_303013,_303014,_303015,_303018),user:tg(_303013,_303014,_303015,_303018),_303023)
?
253 6 Call:
tg(_303013,_303014,_303015,_303018) ?
253 6 Exit:
tg(three,unigram,words,1) ?
252 5 Exit:
findall(_303018-tg(_303013,_303014,_303015,_303018),user:tg(_303013,_303014,_303015,_303018),[1-tg(three,unigram,words,1)])
?
254 5 Call: write('Unsorted
list: ') ?
254 5 Exit: write('Unsorted
list: ') ?
255 5 Call:
write([1-tg(three,unigram,words,1)]) ?
255 5 Exit:
write([1-tg(three,unigram,words,1)]) ?
256 5 Call: nl ?
256 5 Exit: nl ?
257 5 Call: nl ?
257 5 Exit: nl ?
258 5 Call:
keysort([1-tg(three,unigram,words,1)],_367) ?
258 5 Exit:
keysort([1-tg(three,unigram,words,1)],[1-tg(three,unigram,words,1)]) ?
259 5 Call: write('Sorted
list: ') ?
259 5 Exit: write('Sorted
list: ') ?
260 5 Call:
write([1-tg(three,unigram,words,1)]) ?
260 5 Exit:
write([1-tg(three,unigram,words,1)]) ?
261 5 Call: nl ?
261 5 Exit: nl ?
262 5 Call: nl ?
262 5 Exit: nl ?
263 5 Call:
retractall(user:tg(_337,_338,_339,_340)) ?
263 5 Exit:
retractall(user:tg(_337,_338,_339,_340)) ?
264 5 Call: restoreterms([1-tg(three,unigram,words,1)]) ?
265 6 Call:
asserta(user:tg(three,unigram,words,1)) ?
265 6 Exit:
asserta(user:tg(three,unigram,words,1)) ?
266 6 Call: restoreterms([])
?
266 6 Exit: restoreterms([])
?
264 5 Exit:
restoreterms([1-tg(three,unigram,words,1)]) ?
?
246 4 Exit:
trigram([unigram,words],1) ?
?
240 3 Exit:
trigram([three,unigram,words],0) ?
?
238 2 Exit:
trigram([three,unigram,words]) ?
267 2 Call: saveubtgrams ?
268 3 Call: ug(_19071) ?
268 3 Exit: ug(3) ?
269 3 Call: write(ug(3)) ?
269 3 Exit: write(ug(3)) ?
270 3 Call: nl ?
270 3 Exit: nl ?
271 3 Call:
findall(ug(_19051,_19052),user:ug(_19051,_19052),_19056) ?
272 4 Call:
ug(_19051,_19052) ?
?
272 4 Exit: ug(three,1) ?
272 4 Redo: ug(three,1) ?
?
272 4 Exit: ug(unigram,1) ?
272 4 Redo: ug(unigram,1) ?
272 4 Exit: ug(words,1) ?
271 3 Exit:
findall(ug(_19051,_19052),user:ug(_19051,_19052),[ug(three,1),ug(unigram,1),ug(words,1)])
?
273 3 Call:
writelistelems([ug(three,1),ug(unigram,1),ug(words,1)]) ?
274
4 Call: write(ug(three,1)) ?
274 4 Exit:
write(ug(three,1)) ?
275 4 Call: nl ?
275 4 Exit: nl ?
276 4 Call:
writelistelems([ug(unigram,1),ug(words,1)]) ?
277 5 Call:
write(ug(unigram,1)) ?
277 5 Exit:
write(ug(unigram,1)) ?
278 5 Call: nl ?
278 5 Exit: nl ?
279 5 Call:
writelistelems([ug(words,1)]) ?
280 6 Call:
write(ug(words,1)) ?
280 6 Exit:
write(ug(words,1)) ?
281 6 Call: nl ?
281 6 Exit: nl ?
282 6 Call:
writelistelems([]) ?
283 7 Call: nl ?
283 7 Exit: nl ?
282 6 Exit:
writelistelems([]) ?
279 5 Exit:
writelistelems([ug(words,1)]) ?
276 4 Exit:
writelistelems([ug(unigram,1),ug(words,1)]) ?
273 3 Exit:
writelistelems([ug(three,1),ug(unigram,1),ug(words,1)]) ?
284 3 Call: bg(_19035) ?
284 3 Exit: bg(2) ?
285 3 Call: write(bg(2)) ?
285 3 Exit: write(bg(2)) ?
286 3 Call: nl ?
286 3 Exit: nl ?
287 3 Call:
findall(bg(_19051,_19015,_19052),user:bg(_19051,_19015,_19052),_19020) ?
288 4 Call:
bg(_19051,_19015,_19052) ?
?
288 4 Exit:
bg(three,unigram,1) ?
288 4 Redo:
bg(three,unigram,1) ?
288 4 Exit:
bg(unigram,words,1) ?
287 3 Exit:
findall(bg(_19051,_19015,_19052),user:bg(_19051,_19015,_19052),[bg(three,unigram,1),bg(unigram,words,1)])
?
289 3 Call:
writelistelems([bg(three,unigram,1),bg(unigram,words,1)]) ?
290 4 Call:
write(bg(three,unigram,1)) ?
290 4 Exit:
write(bg(three,unigram,1)) ?
291 4 Call: nl ?
291 4 Exit: nl ?
292 4 Call:
writelistelems([bg(unigram,words,1)]) ?
293 5 Call:
write(bg(unigram,words,1)) ?
293 5 Exit:
write(bg(unigram,words,1)) ?
294 5 Call: nl ?
294 5 Exit: nl ?
295 5 Call:
writelistelems([]) ?
296 6 Call: nl ?
296 6 Exit: nl ?
295 5 Exit:
writelistelems([]) ?
292 4 Exit:
writelistelems([bg(unigram,words,1)]) ?
289 3 Exit:
writelistelems([bg(three,unigram,1),bg(unigram,words,1)]) ?
297 3 Call: tg(_18997) ?
297 3 Exit: tg(1) ?
298 3 Call: write(tg(1)) ?
298 3 Exit: write(tg(1)) ?
299 3 Call: nl ?
299 3 Exit: nl ?
300 3 Call: findall(tg(_19051,_19015,_18977,_19052),user:tg(_19051,_19015,_18977,_19052),_18982)
?
301 4 Call:
tg(_19051,_19015,_18977,_19052) ?
301 4 Exit:
tg(three,unigram,words,1) ?
300 3 Exit:
findall(tg(_19051,_19015,_18977,_19052),user:tg(_19051,_19015,_18977,_19052),[tg(three,unigram,words,1)])
?
302 3 Call:
writelistelems([tg(three,unigram,words,1)]) ?
303 4 Call:
write(tg(three,unigram,words,1)) ?
303 4 Exit:
write(tg(three,unigram,words,1)) ?
304
4 Call: nl ?
304 4 Exit: nl ?
305 4 Call:
writelistelems([]) ?
306 5 Call: nl ?
306 5 Exit: nl ?
305 4 Exit:
writelistelems([]) ?
302 3 Exit:
writelistelems([tg(three,unigram,words,1)]) ?
267 2 Exit: saveubtgrams ?
307 2 Call:
generatesentences ?
308 3 Call: ug(_67748) ?
308 3 Exit: ug(3) ?
309 3 Call: bg(_67743) ?
309 3 Exit: bg(2) ?
310 3 Call: tg(_67738) ?
310 3 Exit: tg(1) ?
311 3 Call: nl ?
311 3 Exit: nl ?
312 3 Call: nl ?
312 3 Exit: nl ?
313 3 Call: nl ?
313 3 Exit: nl ?
314 3 Call: _67723 is 3*2*1
?
314 3 Exit: 6 is 3*2*1 ?
315 3 Call:
ug(_67711,_67712) ?
?
315 3 Exit: ug(three,1) ?
316 3 Call:
bg(three,_67705,_67706) ?
316 3 Exit:
bg(three,unigram,1) ?
317 3 Call: tg(three,unigram,_67698,_67699)
?
317 3 Exit:
tg(three,unigram,words,1) ?
318 3 Call: _67690 is 1*1*1
?
318 3 Exit: 1 is 1*1*1 ?
319 3 Call:
write([w123,three,unigram,words,1,1,1,1,6]) ?
319 3 Exit: write([w123,three,unigram,words,1,1,1,1,6])
?
320 3 Call: nl ?
320 3 Exit: nl ?
321 3 Call:
bg(_67651,three,_67653) ?
321 3 Fail:
bg(_67651,three,_67653) ?
315 3 Redo: ug(three,1) ?
?
315 3 Exit: ug(unigram,1) ?
322 3 Call:
bg(unigram,_67705,_67706) ?
322 3 Exit:
bg(unigram,words,1) ?
323 3 Call:
tg(unigram,words,_67698,_67699) ?
323 3 Fail:
tg(unigram,words,_67698,_67699) ?
315 3 Redo: ug(unigram,1) ?
315 3 Exit: ug(words,1) ?
324 3 Call:
bg(words,_67705,_67706) ?
324 3 Fail:
bg(words,_67705,_67706) ?
307 2 Exit:
generatesentences ?
325 2 Call: told ?
325 2 Exit: told ?
?
1 1 Exit:
run(no,no,no,'unigram_test.txt','unigram_test_out.txt') ?
yes
Send mail to
Webmaster with
questions or comments about this web site.
Copyright © 2005 Wollie Boehm
Last modified:
17/11/2005