InfinityQuest - Programming Code Tutorials and Examples with Python, C++, Java, PHP, C#, JavaScript, Swift and more

Menu
  • Home
  • Sitemap

Python Programming Language Best Tutorials and Code Examples

Learn Python Right Now!
Home
PHP
Counting Lines, Paragraphs, or Records in a File in PHP
PHP

Counting Lines, Paragraphs, or Records in a File in PHP

InfinityCoder December 26, 2016

You want to count the number of lines, paragraphs, or records in a file.

To count lines, use fgets():

1
2
3
4
5
6
7
8
9
10
11
$lines = 0;
 
if ($fh = fopen('orders.txt','r')) {
while (! feof($fh)) {
  if (fgets($fh)) {
    $lines++;
  }
}
}
 
print $lines;

Because fgets() reads a line at a time, you can count the number of times it’s called before reaching the end of a file.
To count paragraphs, increment the counter only when you read a blank line:

1
2
3
4
5
6
7
8
9
10
11
12
$paragraphs = 0;
 
if ($fh = fopen('great-american-novel.txt','r')) {
  while (! feof($fh)) {
     $s = fgets($fh);
     if (("\n" == $s) || ("\r\n" == $s)) {
        $paragraphs++;
     }
  }
}
 
print $paragraphs;

To count records, increment the counter only when the line read contains just the record separator and whitespace. Here the record separator is stored in $record_separator:

1
2
3
4
5
6
7
8
9
10
11
12
13
$records = 0;
$record_separator = '--end--';
 
if ($fh = fopen('great-american-textfile-database.txt','r')) {
  while (! feof($fh)) {
   $s = rtrim(fgets($fh));
   if ($s == $record_separator) {
      $records++;
   }
}
}
 
print $records;

When counting lines, $lines is incremented only if fgets() returns a true value. As fgets() moves through the file, it returns each line it retrieves.

When it reaches the last line, it returns false, so $lines isn’t incremented incorrectly. Because EOF has been reached on the file, feof() returns true, and the while loop ends.
When counting paragraphs, the solution works properly on simple text but may produce unexpected results when presented with a long string of blank lines or a file without two consecutive line breaks.

These problems can be remedied with functions based on preg_split().

If the file is small and can be read into memory, use the split_para graphs() function.

This function returns an array containing each paragraph in the file.
For example:

1
2
3
4
5
6
function split_paragraphs($file,$rs="\r?\n") {
  $text = file_get_contents($file);
  $matches = preg_split("/(.*?$rs)(?:$rs)+/s",$text,-1,
                        PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY);
  return $matches;
}

The contents of the file are broken on two or more consecutive newlines and returned in the $matches array.

The default record-separation regular expression, \r?\n, matches both Windows and Unix line breaks.

If the file is too big to read into memory at once, use the split_paragraphs_largefile() function, which reads the file in 16 KB chunks:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
function split_paragraphs_largefile($file,$rs="\r?\n") {
  global $php_errormsg;
 
  $unmatched_text = '';
  $paragraphs = array();
 
  $fh = fopen($file,'r') or die($php_errormsg);
 
  while(! feof($fh)) {
    $s = fread($fh,16384) or die($php_errormsg);
    $text_to_split = $unmatched_text . $s;
 
    $matches = preg_split("/(.*?$rs)(?:$rs)+/s",$text_to_split,-1,
                          PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY);
 
    // if the last chunk doesn't end with two record separators, save it
    // to prepend to the next section that gets read
    $last_match = $matches[count($matches)-1];
    if (! preg_match("/$rs$rs\$/",$last_match)) {
        $unmatched_text = $last_match;
        array_pop($matches);
    } else {
        unmatched_text = '';
    }
 
    $paragraphs = array_merge($paragraphs,$matches);
  }
 
  // after reading all sections, if there is a final chunk that doesn't
  // end with the record separator, count it as a paragraph
  if ($unmatched_text) {
      $paragraphs[] = $unmatched_text;
  }
  return $paragraphs;
}

This function uses the same regular expression as split_paragraphs() to split the file into paragraphs.

When it finds a paragraph end in a chunk read from the file, it saves the rest of the text in the chunk in $unmatched_text and prepends it to the next chunk read.

This includes the unmatched text as the beginning of the next paragraph in the file.
The record-counting function at the end of the Solution lets fgets() figure out how long each line is.

If you can supply a reasonable upper bound on line length, stream_get_line() provides a more concise way to count records.

This function reads a line until it reaches a certain number of bytes or it sees a particular delimiter.

Supply it with the record separator as the delimiter, as shown:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$records = 0;
$record_separator = '--end--';
 
if ($fh = fopen('great-american-textfile-database.txt','r')) {
  $done = false;
  while (! $done) {
    $s = stream_get_line($fh, 65536, $record_separator);
    if (feof($fh)) {
       $done = true;
    } else {
        $records++;
    }
  }
}
 
print $records;

This example assumes that each record is no more that 64 KB (65,536 bytes) long. Each call to stream_get_line() returns one record, not including the record separator.
When stream_get_line() has advanced past the last record separator, it reaches the end of the file, so $done is set to true to stop counting records.

Share
Tweet
Email
Prev Article
Next Article

Related Articles

Creating Your Own Exception Classes in PHP
You want control over how (or if) error messages are …

Creating Your Own Exception Classes in PHP

Cleaning Up Broken or Nonstandard HTML in PHP
You’ve got some HTML with malformed syntax that you’d like …

Cleaning Up Broken or Nonstandard HTML in PHP

About The Author

InfinityCoder
InfinityCoder

Leave a Reply

Cancel reply

Recent Tutorials InfinityQuest

  • Adding New Features to bash Using Loadable Built-ins in bash
    Adding New Features to bash Using Loadable …
    June 27, 2017 0
  • Getting to the Bottom of Things in bash
    Getting to the Bottom of Things in …
    June 27, 2017 0

Recent Comments

  • fer on Turning a Dictionary into XML in Python
  • mahesh on Turning a Dictionary into XML in Python

Categories

  • Bash
  • PHP
  • Python
  • Uncategorized

InfinityQuest - Programming Code Tutorials and Examples with Python, C++, Java, PHP, C#, JavaScript, Swift and more

About Us

Start learning your desired programming language with InfinityQuest.com.

On our website you can access any tutorial that you want with video and code examples.

We are very happy and honored that InfinityQuest.com has been listed as a recommended learning website for students.

Popular Tags

binary data python CIDR convert string into datetime python create xml from dict python dictionary into xml python how to create xml with dict in Python how to write binary data in Python IP Address read binary data python tutorial string as date object python string to datetime python

Archives

  • June 2017
  • April 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
Copyright © 2021 InfinityQuest - Programming Code Tutorials and Examples with Python, C++, Java, PHP, C#, JavaScript, Swift and more
Programming Tutorials | Sitemap