Page 1 of 2 1 2 LastLast
Results 1 to 10 of 14
  1. #1
    Tdol is offline Editor, UsingEnglish.com
    • Member Info
      • Member Type:
      • English Teacher
      • Native Language:
      • British English
      • Home Country:
      • UK
      • Current Location:
      • Philippines
    Join Date
    Nov 2002
    Posts
    42,736
    Post Thanks / Like

    Default Google Ngram Viewer

    Google Ngram Viewer

    You can compare word usage across some of Google's book databases.

  2. #2
    5jj's Avatar
    5jj is offline VIP Member
    • Member Info
      • Member Type:
      • Retired English Teacher
      • Native Language:
      • British English
      • Home Country:
      • England
      • Current Location:
      • Czech Republic
    Join Date
    Oct 2010
    Posts
    28,168
    Post Thanks / Like

    Default Re: Google Ngram Viewer

    Fascinating, thanks. That could become addictive.

  3. #3
    Sanmayce is offline Junior Member
    • Member Info
      • Member Type:
      • Student or Learner
      • Native Language:
      • Bulgarian
      • Home Country:
      • Bulgaria
      • Current Location:
      • Bulgaria
    Join Date
    Jan 2011
    Posts
    36
    Post Thanks / Like

    Default Re: Google Ngram Viewer

    Hi,
    indeed Google Ngram Viewer is a very good initiative, it is a strong starting point/foundation for future word/phrase comparisons/analysis.

    I am not a Google's fan, but I admit ngram datasets given for free download are something that speaks well of people behind this project.

    My console tool Leprechaun_quadrupleton utilizes(in particular) these sets(I downloaded and began to use 4-grams which is 400 chunks/files each 1GB i.e 400GB in total).

    Having run Leprechaun_quadrupleton the result is: 400 files of 8MB or 3.2GB of pure unique 4-grams. The resultant lines/4-grams look like this:
    Code:
    D:\_KA45F~1\_4>dir
    12/12/2010 01:37 PM 1,111,609,996 googlebooks-eng-us-all-4gram-20090715-0.csv
    01/26/2011 06:46 PM 315 googlebooks-eng-us-all-4gram-20090715-0.csv.EXCERPT
    01/26/2011 05:13 AM 514,048 Leprechaun_quadrupleton_Intel_IA-32_11.1.exe
    D:\_KA45F~1\_4>type googlebooks-eng-us-all-4gram-20090715-0.csv.EXCERPT
    ...
    It cut me to 2002 4 4 4
    It cut me to 2004 4 4 4
    It cut me to 2005 6 6 6
    It cut me to 2006 2 2 2
    It cut me to 2007 1 1 1
    It cut me to 2008 1 1 1
    It declares that ' 1816 1 1 1
    It declares that ' 1832 2 2 2
    It declares that ' 1833 1 1 1
    It declares that ' 1834 1 1 1
    It declares that ' 1838 1 1 1
    ...
    D:\_KA45F~1\_4>dir *.excerpt/b>test.lst
    D:\_KA45F~1\_4>Leprechaun_quadrupleton_Intel_IA-32_11.1.exe test.lst test.wrd
    Leprechaun(Fast Greedy Word-Ripper), rev. 13_7pluses quadrupleton_r1, written by Svalqyatchx.
    Leprechaun: 'Oh, well, didn't you hear? Bigger is good, but jumbo is dear.'
    Kaze: Let's see what a 3-way hash + 6,602,752 Binary-Search-Trees can give us,
    also the performance of a 3-way hash + 6,602,752 B-Trees of order 3.
    Size of input file with files for Leprechauning: 53
    Allocating memory 424MB ... OK
    Size of Input TEXTual file: 315
    |; Word count: 39 of them 1 distinct; Done: 64/64
    Bytes per second performance: 315B/s
    Words per second performance: 39W/s
    Flushing unsorted words ...
    Time for making unsorted wordlist: 1 second(s)
    Deallocated memory in MB: 424
    Allocated memory for words in MB: 1
    Allocated memory for pointers-to-words in MB: 1
    Sorting(with 'MultiKeyQuickSortX26Sort' by J. Bentley and R. Sedgewick) ...
    Sort pass 26/26 ...
    Flushing sorted words ...
    Time for sorting unsorted wordlist: 1 second(s)
    Leprechaun: Done.
    D:\_KA45F~1\_4>type test.wrd
    it_cut_me_to
    There is a lot of ways to follow, that is, to use 4-gram phrases, currently I contemplate on automatic reporter: 4-grams(taken from incoming text) compared to 4-grams(taken from googlebooks-eng-us-all-4gram). In a few words: a kind of phrase-checker.

    Regards

  4. #4
    5jj's Avatar
    5jj is offline VIP Member
    • Member Info
      • Member Type:
      • Retired English Teacher
      • Native Language:
      • British English
      • Home Country:
      • England
      • Current Location:
      • Czech Republic
    Join Date
    Oct 2010
    Posts
    28,168
    Post Thanks / Like

    Default Re: Google Ngram Viewer

    Sanmayce's post made me feel so old and out of touch with modern life. Still, I didn't do too badly I suppose. I understood the first two words.

    Oh, and the last one.

  5. #5
    birdeen's call is offline VIP Member
    • Member Info
      • Member Type:
      • Student or Learner
      • Native Language:
      • Polish
      • Home Country:
      • Poland
      • Current Location:
      • Poland
    Join Date
    Jul 2010
    Posts
    5,099
    Post Thanks / Like

    Default Re: Google Ngram Viewer

    Quote Originally Posted by fivejedjon View Post
    Sanmayce's post made me feel so old and out of touch with modern life. Still, I didn't do too badly I suppose. I understood the first two words.

    Oh, and the last one.
    This might help you understand more (it did in my case).

  6. #6
    Sanmayce is offline Junior Member
    • Member Info
      • Member Type:
      • Student or Learner
      • Native Language:
      • Bulgarian
      • Home Country:
      • Bulgaria
      • Current Location:
      • Bulgaria
    Join Date
    Jan 2011
    Posts
    36
    Post Thanks / Like

    Default Re: Google Ngram Viewer

    Don't feel that way fivejedjon, the human touch/vision is far-far more superior than ANY machine, at least I believe this at 100%. The computers already beat/humiliate humans in terms of info processing(just ask who/what is world chess champion), BUT here enters soul... and everything turns into mystery i.e. non-defined-yet.

    Consider this text fragment(an excerpt from a movie subtitles):

    D:\_KA45F~1\_4>type "[2003] When the Last Sword Is Drawn 7.7@imdb CD2.srt.EXCERPT"
    ...
    497
    01:02:27,956 --> 01:02:35,089
    Morioka, in Nanbu.
    It's pretty as a picture!
    498
    01:02:35,196 --> 01:02:38,723
    There's nowhere like it in all Japan!
    499
    01:02:39,834 --> 01:02:43,827
    The Morioka cherry blossom
    splits through rock to bloom.
    500
    01:02:44,506 --> 01:02:48,875
    The Morioka magnolia blooms
    even facing north.
    501
    01:02:49,911 --> 01:02:54,848
    So I want you to run ahead
    of the times.
    502
    01:02:55,950 --> 01:03:00,046
    Go wild. Bloom.

    The idea is to get(with help of some software) all 4-grams(it is a sequence of 4 words a.k.a. collocation) for the given text:

    D:\_KA45F~1\_4>type test3.wrd
    ahead_of_the_times
    blooms_even_facing_north
    blossom_splits_through_rock
    cherry_blossom_splits_through
    i_want_you_to
    it_in_all_japan
    it_s_pretty_as
    like_it_in_all
    magnolia_blooms_even_facing
    morioka_cherry_blossom_splits
    morioka_magnolia_blooms_even
    nowhere_like_it_in
    pretty_as_a_picture
    run_ahead_of_the
    s_nowhere_like_it
    s_pretty_as_a
    so_i_want_you
    splits_through_rock_to
    the_morioka_cherry_blossom
    the_morioka_magnolia_blooms
    there_s_nowhere_like
    through_rock_to_bloom
    to_run_ahead_of
    want_you_to_run
    you_to_run_ahead

    Computers(in particular tablets being the future HANDY personal assistants) will remain only assistants and nothing more even when the AI(artificial intelligence) enters(hopefully) our life, I mean the old school is not dying just enhanced.

  7. #7
    birdeen's call is offline VIP Member
    • Member Info
      • Member Type:
      • Student or Learner
      • Native Language:
      • Polish
      • Home Country:
      • Poland
      • Current Location:
      • Poland
    Join Date
    Jul 2010
    Posts
    5,099
    Post Thanks / Like

    Default Re: Google Ngram Viewer

    Quote Originally Posted by Sanmayce View Post
    (just ask who/what is world chess champion),
    As far as I know there are separate titles for humans and for computers and there is no joint title. (We're too scared to give them a chance! )

  8. #8
    Sanmayce is offline Junior Member
    • Member Info
      • Member Type:
      • Student or Learner
      • Native Language:
      • Bulgarian
      • Home Country:
      • Bulgaria
      • Current Location:
      • Bulgaria
    Join Date
    Jan 2011
    Posts
    36
    Post Thanks / Like

    Default Re: Google Ngram Viewer

    Quote Originally Posted by birdeen's call View Post
    (We're too scared to give them a chance! )
    Ha-ha you are right!
    I have been watching this humiliation since Gary's first battles with IBM's Deep Blue, also with Deeper Blue. Also with other super-chess-computers.

    I have a very high opinion of Kasparov, but he had told us(in 1997-) that a machine cannot "see" the game, which statement I knew back then was WRONG. The computer can be taught to develop tactics(mini-strategy), by scaling up, into deep-deep strategy which has nothing to do with the power of humans namely soul or creativity as in his case/interview.

  9. #9
    Sanmayce is offline Junior Member
    • Member Info
      • Member Type:
      • Student or Learner
      • Native Language:
      • Bulgarian
      • Home Country:
      • Bulgaria
      • Current Location:
      • Bulgaria
    Join Date
    Jan 2011
    Posts
    36
    Post Thanks / Like

    Default Re: Google Ngram Viewer

    'Graphein' a 4-gram-Phrase-Checker, revision 1-

    GOALS:
    - To offer 100% free open-source copyleft software(32bit Windows console tools written in C);
    - To enrich(beautify as kids would say) the ability to make phrase reports/analyses of user-chosen English texts in order to estimate the appropriateness of 4-gram phrases/collocations;
    - Targeted users are mostly people(this includes kids, learners and native English users as well) wanting to explore the English collocations by immersing themselves into 100+ million of google-4-grams;
    - To allow an in-depth phrase-search independently from third-parties(and eventually second-parties free too).

    First drawback: the package must/will be as simple as possible in regards of usage. To be done. The whole process of making reports must be in two steps:
    - Copying all needed text files(folders also) into our working directory;
    - Running a single batch file.
    Second drawback: still not downloadable.
    Third drawback: 'Graphein' developer being an amateur.
    Fourth drawback: currently 'Graphein' is awfully-very(analyzing 'The Little Match Girl' took 02:27:00 hours or 400x(23seconds per file)) slow.
    Fifth drawback: something rotten there(with googlebooks-eng-us-all-4gram-20090715 files) is! I am disappointed with the unexpected high number of unknown(Unfamiliar!) 4-grams:
    - 'The Little Match Girl' having analyzed with 'Graphein' r.1 gives Total/Found/Unfamiliar: 580/370/210 phrases.
    - I cannot figure it out! Phrases like:
    wonderful_roast_goose_and Unfamiliar!
    wonderful_smell_of_roast Unfamiliar!
    wonderfully_the_fire_burned Unfamiliar!
    would_surely_beat_her Unfamiliar!

    not to be part of US English Google books, does anybody know what causes this frustrating misery?

    My wish here is to present(in a hurry-mode) some aspects of not-completed-yet free-software-package which is being designed for making user-phrases vs google-books-phrases reports.
    I give below a short help/guide step-by-step how to use these 100% free 32bit console programs.

    ~ The whole process looks like:
    [incoming text file(s)] -> phrase-checker-package -> [text file containing all phrases(described whether they have been encountered in google-books-phrases or not)]

    ~ Or as in the following example:
    [...] -> phrase-checker-package -> [...]
    [lille_pige_med_svovlstikkerne] -> phrase-checker-package -> [lille_pige_med_svovlstikkerne Unfamiliar!]
    [med_svovlstikkerne_by_jean] -> phrase-checker-package -> [med_svovlstikkerne_by_jean Unfamiliar!]
    [more_beautiful_than_the] -> phrase-checker-package -> [more_beautiful_than_the Found!]
    [rattled_by_terribly_fast] -> phrase-checker-package -> [rattled_by_terribly_fast Found!]
    [reached_both_her_hands] -> phrase-checker-package -> [reached_both_her_hands Found!]
    [really_seemed_to_the] -> phrase-checker-package -> [really_seemed_to_the Found!]
    [those_in_the_printshops] -> phrase-checker-package -> [those_in_the_printshops Unfamiliar!]
    [...] -> phrase-checker-package -> [...]

    ~ I intend the final report(tabulated) to look like this:
    ...
    lille_pige_med_svovlstikkerne \t Unfamiliar!
    ...
    med_svovlstikkerne_by_jean \t Unfamiliar! 3rd-bigram-OK!
    more_beautiful_than_the \t Found!
    ...
    rattled_by_terribly_fast \t Found!
    reached_both_her_hands \t Found!
    really_seemed_to_the \t Found!
    ...
    those_in_the_printshops \t Unfamiliar! 1st-bigram-OK! 2nd-bigram-OK!
    ...

    ~ Discussion note:
    I would appreciate any suggestion(s) regarding simplifying the usage of the whole package.
    After all I develop(amateurishly) this package especially for kids having PCs.
    I want to write a PDF file with simple step-by-step instructions but I need some feedback(where difficulties are pointed out) in order to simplify the package enough thus making it usable even from computer dummies/beginners.
    Somewhat a problem remains with making the package downloadable since the package revision 1 is about 756MB whereas my site's poor-bandwidth is already heavily loaded.

    Reference #1:
    PDF: 'The_Little_Match_Girl'_analyzed_by_'Graphein'
    Reference #2:
    PDF: Getting_started_using_'Graphein'_phrase-package

    Enjoy!
    Last edited by Sanmayce; 06-Feb-2011 at 16:22.

  10. #10
    birdeen's call is offline VIP Member
    • Member Info
      • Member Type:
      • Student or Learner
      • Native Language:
      • Polish
      • Home Country:
      • Poland
      • Current Location:
      • Poland
    Join Date
    Jul 2010
    Posts
    5,099
    Post Thanks / Like

    Default Re: Google Ngram Viewer

    Some thoughts after skimming your PDFs.

    1) We put spaces before opening brackets:

    develop (amateurishly) - correct
    develop(amateurishly) - incorrect

    2) If you want your program to be used by "computer dummies", you should probably make it run in a separate window. Many computer users have never seen a TUI.

    3) If you want your program to be widely used, you certainly need bandwidth. You could try to find other enthusiasts who would be willing to cooperate.

    4) You should probably tell people what 4-grams are and why they want to find them.

Page 1 of 2 1 2 LastLast

Similar Threads

  1. The artist's work should take the viewer to a place
    By Volcano1985 in forum Ask a Teacher
    Replies: 3
    Last Post: 26-Oct-2009, 19:28
  2. I did Google?
    By flytothesky in forum Ask a Teacher
    Replies: 1
    Last Post: 30-Dec-2008, 20:11
  3. [General] viewer discretion is advised
    By thedaffodils in forum Ask a Teacher
    Replies: 2
    Last Post: 10-Sep-2008, 12:09

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •