Script frequence de mots dans texte

From Mondothèque

  1. !/usr/bin/env/ python

from collections import Counter import string

Prepare your book in plain text format. Makes a frequency dictionary of the words in the book. Sorts the words in the dictionary by frequency and writes it to a text file called frequencies.txt. Program ignores capitalization as well as punctuation

  1. functions
  1. remove caps + breaks + punctuation

def remove_punct(f): tokens = (' '.join(line.replace('\n', ) for line in f)).lower() for c in string.punctuation: tokens= tokens.replace(c,"") return tokens

    1. create frequency dictionary

def freq_dict(tokens): frequency_d = {} tokens = tokens.split(" ") for word in tokens: try: frequency_d[word] += 1 except KeyError: frequency_d[word] = 1 return frequency_d

    1. sort words by frequency (import module)

def sort_dict(frequency_d): c=Counter(frequency_d) frequency = c.most_common() return frequency

  1. write words to text file

def write_to_file(frequency): g = open('frequencies.txt', 'wt') for key, value in frequency: g.write(("{} : {} \n".format(value, key))) g.close()

  1. execute text file as f // specify your source text here

f = open('0_plus_petit_document.txt', 'rt') frequency_d ={}

tokens = remove_punct(f) print(tokens) frequency_d = freq_dict(tokens) print(frequency_d) frequency = sort_dict(frequency_d) write_to_file(frequency)