Page 1 of 3 123 LastLast
Results 1 to 10 of 21
  1. #1
    colin.horne Guest

    Default Average non-meaningful word ratio

    Hi

    I need to know the ratio of meaningfull words against non-meaningfull words in a sentence. I don't know if I explained that correctly, so here's an example:

    "hello. this is a test which you'll all find very interesting and will study for many hours when you get home."

    These are the "meaningfull" words in that sentence:
    hello, test, you'll, find, very, interesting, study, hours, you, home

    All the rest are "non-meaningfull" (ie: they have no impact in the sentance other than to structure it), thus giving a ratio of 10:21. Is there an average ratio like this that matches for all English documents (on average, of course)?

    If that doesn't exist, is there a maximum "keyword" average?

    For example, if I'm talking about my pen. and my desk:

    "This is my pen. I normally keep my pen on my desk at all times. I work on my desk, and use my pen for writting with".

    Here, the ratio of keywords (pen, desk) against non-keywords is 5:27. Is there a maximum ratio like this, where the keyword shall not be said more than X number of times in a sentence?

    Many thanks, and sorry for the unusual question!

  2. #2
    Red5 is offline Webmaster, UsingEnglish.com
    • Member Info
      • Member Type:
      • Interested in Language
      • Native Language:
      • British English
      • Home Country:
      • England
      • Current Location:
      • England
    Join Date
    Nov 2002
    Posts
    3,379
    Post Thanks / Like

    Default Re: Average non-meaningful word ratio

    Quote Originally Posted by colin.horne
    If that doesn't exist, is there a maximum "keyword" average?

    For example, if I'm talking about my pen. and my desk:

    "This is my pen. I normally keep my pen on my desk at all times. I work on my desk, and use my pen for writting with".

    Here, the ratio of keywords (pen, desk) against non-keywords is 5:27. Is there a maximum ratio like this, where the keyword shall not be said more than X number of times in a sentence?
    You could take a look at our advanced text analyser (you'll need to log-in to the members' area to use it)

    http://www.usingenglish.com/members/text-statistics.php

    Register Now

    It does some of what you ask, and if you have any ideas for improvment for it I'd be happy to discuss options with you (I'm about to start modifying it anyway).

    Hope that helps.
    Last edited by Red5; 26-Nov-2004 at 14:07.

  3. #3
    colinhorne Guest

    Default Re: Average non-meaningful word ratio

    Hi

    Thanks for that, but it's not exactly what I was looking for...

    I'm actually trying to make some artificial inteligence for a search engine, and it needs to be able to distinguish between keywords and trivail (non-meaningfull) words. I do that by counting the word frequency. The more times a word appears, the more likely it is to be trivial. However, if a word is appearing frequently because the artical heavily focuses on it, then obviously I don't want it to be considered at all trivial.

    My solution is:

    If I can find out the average ratio of non-meaningfull words against keywords, I'll be able to guess whether the world is non-meaningfull or very meaningfull. It'll also use loads of other tests at different levels, etc etc.

    Thanks
    Last edited by Red5; 26-Nov-2004 at 14:08.

  4. #4
    Nahualli Guest

    Default Re: Average non-meaningfull word ratio

    That's a brilliant idea, although I see a major problem in it. To be able to determine a ratio like the one you're describing you would have to first teach a computer program to analyze what a proper sentence should look like under ALL circumstances. Basically you'd have to teach a heuristic algorithm to go beyond itself and analyze all the things that cannot be measured. Namely intent, mood and tone.

    As we've all seen with so-called "grammar checks" and "translation software", technology has a loooooooooooooong way to go before this is a reality.

    -Nah-

  5. #5
    colinhorne Guest

    Default Re: Average non-meaningfull word ratio

    Hehe, yeah. That would be the ultimate goal, but I could never be bothered to do that :P

    That's only one "layer" of the anaysing... The other "layers" look for where the text is. For example, text which is in bold is considered to be important, but I need a way of making sure any trivial bold text (the, etc) doesn't also get considered as being important.

    Once finished, it'll probably try to learn from these layers:
    (+) - makes word more important
    (-) - makes word less important

    (+) <b>,<h>,<a>,<i>,<u>, etc
    (+) frequency ratio UP TO (perhaps...) 1:5. A lower ratio results in (-)
    (Few more, working on them)

    Also, I'd like it to learn from previous searches. *Most* people don't include words such as "the" in their search queries (but some do - I'd need a way of checking for that...), so words which had once been in a search term would increase the words importance.

    Loads of other ideas in the back of my head, but can't quite put my finger on them yet...

    It should be quite cool once finished though (I hope - otherwise I've wasted several weeks work!).

    Cheers

  6. #6
    Tdol is offline Editor, UsingEnglish.com
    • Member Info
      • Member Type:
      • English Teacher
      • Native Language:
      • British English
      • Home Country:
      • UK
      • Current Location:
      • Philippines
    Join Date
    Nov 2002
    Posts
    42,527
    Post Thanks / Like

    Default Re: Average non-meaningfull word ratio

    It depends very much on length- lexical density falls as a text grows, so I think it would be hard to find an absolute ratio, but it might be possible to find a ratio for different lengths.

  7. #7
    Red5 is offline Webmaster, UsingEnglish.com
    • Member Info
      • Member Type:
      • Interested in Language
      • Native Language:
      • British English
      • Home Country:
      • England
      • Current Location:
      • England
    Join Date
    Nov 2002
    Posts
    3,379
    Post Thanks / Like

    Smile Re: Average non-meaningfull word ratio

    Quote Originally Posted by colinhorne
    I'm actually trying to make some artificial inteligence for a search engine, and it needs to be able to distinguish between keywords and trivail (non-meaningfull) words.
    Hi Colin.

    If you're talking about search engines and their algos, I would recommend you visit SearchEngineWatch.com and ask in their Search Technology & Relevancy forum. Their moderator, Orion, has an encyclopedic knowledge of all things algorithmic regarding search engines.

    There's one condition, that you come back and let us know how you get on!

    Kind regards,

    Red5

  8. #8
    Red5 is offline Webmaster, UsingEnglish.com
    • Member Info
      • Member Type:
      • Interested in Language
      • Native Language:
      • British English
      • Home Country:
      • England
      • Current Location:
      • England
    Join Date
    Nov 2002
    Posts
    3,379
    Post Thanks / Like

    Default Re: Average non-meaningfull word ratio

    Quote Originally Posted by colinhorne
    If I can find out the average ratio of non-meaningfull words against keywords, I'll be able to guess whether the world is non-meaningfull or very meaningfull. It'll also use loads of other tests at different levels, etc etc.

    Thanks
    This is actually an area or investigation quite close to my heart. I willl be sure to follow your discussion with interest, here or at the SEW forum. It sounds to me as if you require an expanded stop-word list to define the non-meaning words.

    Anyway, I wish you luck with your research.

    Edited to add:

    I'd be interested in starting a new forum area here specifically to do with analysing language. Would anyone (other than me) be interested?
    Last edited by Red5; 19-Nov-2004 at 13:22.

  9. #9
    colinhorne Guest

    Default Re: Average non-meaningfull word ratio

    Hi

    Thanks for the advice, I'll head over searchenginewatch.com shortly...

    Once (if ever!) I get this finished, I'd be happy to give you the source code (or document the workings, etc) for you, if you're interested. That is, if it works of course. I have to admit, it wasn't desperatly sensible for me to dive into this project though - my main strengths are encryption and database driven apps, not evalutating the English language (and I've got the satisfaction of failing my English exams when I went to school to prove that fact ).

    Cheers

  10. #10
    Tdol is offline Editor, UsingEnglish.com
    • Member Info
      • Member Type:
      • English Teacher
      • Native Language:
      • British English
      • Home Country:
      • UK
      • Current Location:
      • Philippines
    Join Date
    Nov 2002
    Posts
    42,527
    Post Thanks / Like

    Default Re: Average non-meaningfull word ratio

    I would be interested.

Page 1 of 3 123 LastLast

Similar Threads

  1. word stress
    By bread in forum Ask a Teacher
    Replies: 1
    Last Post: 16-Jul-2004, 01:05
  2. Word Checker 1 - The Dolch basic word list
    By Tdol in forum UsingEnglish.com Content
    Replies: 0
    Last Post: 24-May-2004, 13:26
  3. Word Checker 1 - The Dolch basic word list
    By Tdol in forum UsingEnglish.com Content
    Replies: 0
    Last Post: 19-Apr-2004, 15:30
  4. word for "word reminder"
    By Anonymous in forum Ask a Teacher
    Replies: 3
    Last Post: 09-Dec-2003, 05:41
  5. Questions about Inversions - Inverted Word Order
    By Anonymous in forum General Language Discussions
    Replies: 21
    Last Post: 31-May-2003, 22:43

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Hotchalk