InfinityQuest - Programming Code Tutorials and Examples with Python, C++, Java, PHP, C#, JavaScript, Swift and more

Menu
  • Home
  • Sitemap

Python Programming Language Best Tutorials and Code Examples

Learn Python Right Now!
Home
PHP
Processing Every Word in a File in PHP
PHP

Processing Every Word in a File in PHP

InfinityCoder December 26, 2016

You want to do something with every word in a file. For example, you want to build a concordance of how many times each word is used to compute similarities between documents.

Read in each line with fgets(), separate the line into words, and process each word:

1
2
3
4
5
6
7
8
$fh = fopen('great-american-novel.txt','r') or die($php_errormsg);
while (! feof($fh)) {
  if ($s = fgets($fh)) {
      $words = preg_split('/\s+/',$s,-1,PREG_SPLIT_NO_EMPTY);
      // process words
  }
}
fclose($fh) or die($php_errormsg);

This example calculates the average word length in a file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$word_count = $word_length = 0;
 
if ($fh = fopen('great-american-novel.txt','r')) {
  while (! feof($fh)) {
     if ($s = fgets($fh)) {
         $words = preg_split('/\s+/',$s,-1,PREG_SPLIT_NO_EMPTY);
         foreach ($words as $word) {
             $word_count++;
             $word_length += strlen($word);
         }
     }
  }
}
print sprintf("The average word length over %d words is %.02f characters.",
               $word_count,
               $word_length/$word_count);

Processing every word proceeds differently depending on how “word” is defined. The code in this recipe uses the Perl-compatible regular expression engine’s \s whitespace metacharacter, which includes space, tab, newline, carriage return, and formfeed.
The Perl-compatible engine also has a word-boundary assertion (\b) that matches between a word character (alphanumeric) and a nonword character (anything else).

Using \b instead of \s to delimit words most noticeably treats words with embedded punctuation differently.

The term 6 o’clock is two words when split by whitespace (6 and o’clock); it’s four words when split by word boundaries (6, o, ‘, and clock).

Share
Tweet
Email
Prev Article
Next Article

Related Articles

Sending Mail in PHP
You want to send an email message. This can be …

Sending Mail in PHP

Exchanging Values Without Using Temporary Variables in PHP
You want to exchange the values in two variables without …

Exchanging Values Without Using Temporary Variables in PHP

About The Author

InfinityCoder
InfinityCoder

Leave a Reply

Cancel reply

Recent Tutorials InfinityQuest

  • Adding New Features to bash Using Loadable Built-ins in bash
    Adding New Features to bash Using Loadable …
    June 27, 2017 0
  • Getting to the Bottom of Things in bash
    Getting to the Bottom of Things in …
    June 27, 2017 0

Recent Comments

  • fer on Turning a Dictionary into XML in Python
  • mahesh on Turning a Dictionary into XML in Python

Categories

  • Bash
  • PHP
  • Python
  • Uncategorized

InfinityQuest - Programming Code Tutorials and Examples with Python, C++, Java, PHP, C#, JavaScript, Swift and more

About Us

Start learning your desired programming language with InfinityQuest.com.

On our website you can access any tutorial that you want with video and code examples.

We are very happy and honored that InfinityQuest.com has been listed as a recommended learning website for students.

Popular Tags

binary data python CIDR convert string into datetime python create xml from dict python dictionary into xml python how to create xml with dict in Python how to write binary data in Python IP Address read binary data python tutorial string as date object python string to datetime python

Archives

  • June 2017
  • April 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
Copyright © 2021 InfinityQuest - Programming Code Tutorials and Examples with Python, C++, Java, PHP, C#, JavaScript, Swift and more
Programming Tutorials | Sitemap