CSC207-DSSL
Would you like to react to this message? Create an account in a few clicks or log in to continue.

About Keyword Extraction --

4 posters

Go down

About Keyword Extraction -- Empty About Keyword Extraction --

Post  Steven Thu Oct 28, 2010 10:04 pm

So we have to check the quality of the keyword extractor. If we find out that it's not "good", can we switch it with another extractor?

Once we have a decent keyword extractor, we can rank the faculty members perhaps by using Naive Bayes probability

It can be said that other than the frequency, what makes a word a key is the specialty of the word to the document that the word exists in. If a
word appears much more frequently in a document than with respect to other documents, this can be
another distinguishing feature of that word on deciding whether it is a keyword or not.

Combining these two properties, we obtain the metric TFxIDF (standing for Term Frequency x Inverse
Document Frequency) score, which is the standard metric used in Information extraction [2], and for
a word W in document D, is defined as

TFxIDF (P, D) = P(word in D is W) x [ - log P(W in a document) ].

The first term in this formula is calculated by counting the number of times the word occurs in the
document and dividing it to the total number of words in it. The second term is calculated by counting
the number of documents in the training set that the word occurs in except D and dividing it by the
total number of documents in the training set.

The quote is from this document:
http://www.cs.bilkent.edu.tr/~guvenir/courses/cs550/Workshop/Yasin_Uzun.pdf

Steven
Admin

Posts : 16
Join date : 2010-10-11

Back to top Go down

About Keyword Extraction -- Empty Re: About Keyword Extraction --

Post  Lenny Sun Nov 07, 2010 8:42 pm

remember how we noticed that we could only get 2 results at a time during our tests? well turns out thats just a coincidence. its possible to get a lot more results. i copied and pasted one of the professors' entire description into the box and got 9 results, one of them being that same professor with 75% relevance.
dunno wat this means for us in terms of where to go now in terms of improving results but at least we know where not to go (search thru code frantically for a non-existent something that limits results)

Lenny
Admin

Posts : 65
Join date : 2010-10-08

Back to top Go down

About Keyword Extraction -- Empty Uh oh

Post  Lenny Wed Nov 10, 2010 8:21 pm

looks like multiplying the string doesn't help extract more keywords. it gets the same results as it would without the multiplier.
... it seems as though it looks for more than just repetition to add the word. (i guess thats how the extractor factors out common words like the and a but etc.)
atm im givin it a proper string then butchering it down until the extractor can't get keywords anymore

i cant think of any way to solve this so for now I'll just get the program to ask the user to type in more?

Lenny
Admin

Posts : 65
Join date : 2010-10-08

Back to top Go down

About Keyword Extraction -- Empty Re: About Keyword Extraction --

Post  sadia Wed Nov 10, 2010 8:31 pm

yeah, i guess we could do that.


just put a message saying "Your entry must be at least [_] words long. Please enter more..blah." or something like that.

sadia
Admin

Posts : 75
Join date : 2010-10-10

Back to top Go down

About Keyword Extraction -- Empty Re: About Keyword Extraction --

Post  daniel Wed Nov 10, 2010 11:19 pm

committed a fix for the extractor.

isWindows changed to isMac || isUnix. wrong slashes =P

daniel
Admin

Posts : 87
Join date : 2010-10-08

Back to top Go down

About Keyword Extraction -- Empty Re: About Keyword Extraction --

Post  Lenny Wed Nov 10, 2010 11:47 pm

the fix doesnt work with my computer so i set it back (but didnt commit yet)
does the ..\\..\\ not work for ur mac?

Lenny
Admin

Posts : 65
Join date : 2010-10-08

Back to top Go down

About Keyword Extraction -- Empty Re: About Keyword Extraction --

Post  sadia Wed Nov 10, 2010 11:48 pm

it wasn't working, but it worked after daniel fixed it ..

sadia
Admin

Posts : 75
Join date : 2010-10-10

Back to top Go down

About Keyword Extraction -- Empty Re: About Keyword Extraction --

Post  Lenny Wed Nov 10, 2010 11:51 pm

ya i updated. so if the original worked with me and the fix worked with u guys, i guess that means windows and mac use the same path names?

Lenny
Admin

Posts : 65
Join date : 2010-10-08

Back to top Go down

About Keyword Extraction -- Empty Re: About Keyword Extraction --

Post  daniel Wed Nov 10, 2010 11:57 pm

macs need '/' not '\'

so windows need '\'?

daniel
Admin

Posts : 87
Join date : 2010-10-08

Back to top Go down

About Keyword Extraction -- Empty Re: About Keyword Extraction --

Post  Lenny Thu Nov 11, 2010 12:02 am

naw my computer worked fine with '/'
"lib/KeywordExtractionApp/gate"

Lenny
Admin

Posts : 65
Join date : 2010-10-08

Back to top Go down

About Keyword Extraction -- Empty Re: About Keyword Extraction --

Post  Lenny Thu Nov 11, 2010 12:48 am

i changed it back to the '/' since it worked for everyone

Lenny
Admin

Posts : 65
Join date : 2010-10-08

Back to top Go down

About Keyword Extraction -- Empty Re: About Keyword Extraction --

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum