About Keyword Extraction --
4 posters
Page 1 of 1
About Keyword Extraction --
So we have to check the quality of the keyword extractor. If we find out that it's not "good", can we switch it with another extractor?
Once we have a decent keyword extractor, we can rank the faculty members perhaps by using Naive Bayes probability
The quote is from this document:
http://www.cs.bilkent.edu.tr/~guvenir/courses/cs550/Workshop/Yasin_Uzun.pdf
Once we have a decent keyword extractor, we can rank the faculty members perhaps by using Naive Bayes probability
It can be said that other than the frequency, what makes a word a key is the specialty of the word to the document that the word exists in. If a
word appears much more frequently in a document than with respect to other documents, this can be
another distinguishing feature of that word on deciding whether it is a keyword or not.
Combining these two properties, we obtain the metric TFxIDF (standing for Term Frequency x Inverse
Document Frequency) score, which is the standard metric used in Information extraction [2], and for
a word W in document D, is defined as
TFxIDF (P, D) = P(word in D is W) x [ - log P(W in a document) ].
The first term in this formula is calculated by counting the number of times the word occurs in the
document and dividing it to the total number of words in it. The second term is calculated by counting
the number of documents in the training set that the word occurs in except D and dividing it by the
total number of documents in the training set.
The quote is from this document:
http://www.cs.bilkent.edu.tr/~guvenir/courses/cs550/Workshop/Yasin_Uzun.pdf
Steven- Admin
- Posts : 16
Join date : 2010-10-11
Re: About Keyword Extraction --
remember how we noticed that we could only get 2 results at a time during our tests? well turns out thats just a coincidence. its possible to get a lot more results. i copied and pasted one of the professors' entire description into the box and got 9 results, one of them being that same professor with 75% relevance.
dunno wat this means for us in terms of where to go now in terms of improving results but at least we know where not to go (search thru code frantically for a non-existent something that limits results)
dunno wat this means for us in terms of where to go now in terms of improving results but at least we know where not to go (search thru code frantically for a non-existent something that limits results)
Lenny- Admin
- Posts : 65
Join date : 2010-10-08
Uh oh
looks like multiplying the string doesn't help extract more keywords. it gets the same results as it would without the multiplier.
... it seems as though it looks for more than just repetition to add the word. (i guess thats how the extractor factors out common words like the and a but etc.)
atm im givin it a proper string then butchering it down until the extractor can't get keywords anymore
i cant think of any way to solve this so for now I'll just get the program to ask the user to type in more?
... it seems as though it looks for more than just repetition to add the word. (i guess thats how the extractor factors out common words like the and a but etc.)
atm im givin it a proper string then butchering it down until the extractor can't get keywords anymore
i cant think of any way to solve this so for now I'll just get the program to ask the user to type in more?
Lenny- Admin
- Posts : 65
Join date : 2010-10-08
Re: About Keyword Extraction --
yeah, i guess we could do that.
just put a message saying "Your entry must be at least [_] words long. Please enter more..blah." or something like that.
just put a message saying "Your entry must be at least [_] words long. Please enter more..blah." or something like that.
sadia- Admin
- Posts : 75
Join date : 2010-10-10
Re: About Keyword Extraction --
committed a fix for the extractor.
isWindows changed to isMac || isUnix. wrong slashes =P
isWindows changed to isMac || isUnix. wrong slashes =P
daniel- Admin
- Posts : 87
Join date : 2010-10-08
Re: About Keyword Extraction --
the fix doesnt work with my computer so i set it back (but didnt commit yet)
does the ..\\..\\ not work for ur mac?
does the ..\\..\\ not work for ur mac?
Lenny- Admin
- Posts : 65
Join date : 2010-10-08
Re: About Keyword Extraction --
it wasn't working, but it worked after daniel fixed it ..
sadia- Admin
- Posts : 75
Join date : 2010-10-10
Re: About Keyword Extraction --
ya i updated. so if the original worked with me and the fix worked with u guys, i guess that means windows and mac use the same path names?
Lenny- Admin
- Posts : 65
Join date : 2010-10-08
Re: About Keyword Extraction --
macs need '/' not '\'
so windows need '\'?
so windows need '\'?
daniel- Admin
- Posts : 87
Join date : 2010-10-08
Re: About Keyword Extraction --
naw my computer worked fine with '/'
"lib/KeywordExtractionApp/gate"
"lib/KeywordExtractionApp/gate"
Lenny- Admin
- Posts : 65
Join date : 2010-10-08
Re: About Keyword Extraction --
i changed it back to the '/' since it worked for everyone
Lenny- Admin
- Posts : 65
Join date : 2010-10-08
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum