InfinityQuest - Programming Code Tutorials and Examples with Python, C++, Java, PHP, C#, JavaScript, Swift and more

Menu
  • Home
  • Sitemap

Python Programming Language Best Tutorials and Code Examples

Learn Python Right Now!
Home
PHP
Iterating Efficiently over Large or Expensive Datasets in PHP
PHP

Iterating Efficiently over Large or Expensive Datasets in PHP

InfinityCoder November 20, 2016

You want to iterate through a list of items, but the entire list takes up a lot of memory or is very slow to generate.

Use a generator:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
function FileLineGenerator($file) {
   if (!$fh = fopen($file, 'r')) {
       return;
}
 
while (false !== ($line = fgets($fh))) {
   yield $line;
}
 
   fclose($fh);
}
 
$file = FileLineGenerator('log.txt');
foreach ($file as $line) {
    if (preg_match('/^rasmus: /', $line)) { print $line; }
}

Generators provide a simple way to efficiently loop over items without the overhead and expense of loading all the data into an array.
A generator is a function that returns an iterable object. As you loop through the object, PHP repeatedly calls the generator to get the next value, which is returned by the generator function using the yield keyword.
Unlike normal functions where you start fresh every time, PHP preserves the current function state between calls to a generator.

This allows you to keep any necessary information to provide the next value.
If there’s no more data, exit the function without a return or with an empty return statement.

(Trying to return data from a generator is illegal.)
A perfect use of a generator is processing all the lines in a file.

The simplest way is to use the file() function. This open the file, loads each line into an element of an array, and closes it.

However, then you store the entire file in memory.

1
2
3
4
$file = file('log.txt');
foreach ($file as $line) {
    if (preg_match('/^rasmus: /', $line)) { print $line; }
}

Another option is to use the standard file reading functions, but then your code for reading from the file and acting on each line gets intertwined.

This doesn’t make for reusable or easy-to-read code:

1
2
3
4
5
6
7
8
9
10
11
function print_matching_lines($file, $regex) {
   if (!$fh = fopen('log.txt','r')) {
       return;
}
   while(false !== ($line = fgets($fh))) {
       if (preg_match($regex, $line)) { print $line; }
    }
    fclose($fh);
}
 
print_matching_lines('log.txt', '/^rasmus: /');

However, if you wrap the code to process the file into a generator, you get the best of both options—a general function to efficiently iterate through lines of a file and then clean syntax as if all the data is stored in an array:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
function FileLineGenerator($file) {
   if (!$fh = fopen($file, 'r')) {
       return;
   }
 
   while (false !== ($line = fgets($fh))) {
       yield $line;
   }
   fclose($fh);
}
$file = FileLineGenerator('log.txt');
foreach ($file as $line) {
    if (preg_match('/^rasmus: /', $line)) { print $line; }
}

In a generator, control passes back and forth between the loop and the function via the yield statement.

The first time the generator is called, control begins at the top of thefunction and pauses when it reaches a yield statement, returning the value.

In this example, the FileLineGenerator() generator function loops through lines of a file.

After the file is opened, fgets() is called in a loop. As long as there are more lines, the loop yields $line back to the iterator.

At the end of the file, the loop terminates, the file is closed, and the function terminates.

Because nothing is yielded back, the foreach() exits. Now, FileLineGenerator() can be used any time you want to loop through a file.

Theprevious example prints lines beginning with rasmus: .

The following one prints a random line from the file:

1
2
3
4
5
6
7
8
9
$line_number = 0;
foreach (FileLineGenerator('sayings.txt') as $line) {
    $line_number++;
    if (mt_rand(0, $line_number - 1) == 0) {
        $selected = $line;
    }
}
 
print $selected . "\n";

Despite a completely different use case, FileLineGenerator() is reusable without modifications.

In this example, the generator is invoked from within the foreach loop instead of storing it in a variable.
You cannot rewind a generator. They only iterate forward.

Share
Tweet
Email
Prev Article
Next Article

Related Articles

Splitting a Filename into Its Component Parts in PHP
You want to find a file’s path and filename; for …

Splitting a Filename into Its Component Parts in PHP

Passing Values by Reference in PHP
You want to pass a variable to a function and …

Passing Values by Reference in PHP

About The Author

InfinityCoder
InfinityCoder

Leave a Reply

Cancel reply

Recent Tutorials InfinityQuest

  • Adding New Features to bash Using Loadable Built-ins in bash
    Adding New Features to bash Using Loadable …
    June 27, 2017 0
  • Getting to the Bottom of Things in bash
    Getting to the Bottom of Things in …
    June 27, 2017 0

Recent Comments

    Categories

    • Bash
    • PHP
    • Python
    • Uncategorized

    InfinityQuest - Programming Code Tutorials and Examples with Python, C++, Java, PHP, C#, JavaScript, Swift and more

    About Us

    Start learning your desired programming language with InfinityQuest.com.

    On our website you can access any tutorial that you want with video and code examples.

    We are very happy and honored that InfinityQuest.com has been listed as a recommended learning website for students.

    Popular Tags

    binary data python CIDR convert string into datetime python create xml from dict python dictionary into xml python how to create xml with dict in Python how to write binary data in Python IP Address read binary data python tutorial string as date object python string to datetime python

    Archives

    • June 2017
    • April 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    Copyright © 2019 InfinityQuest - Programming Code Tutorials and Examples with Python, C++, Java, PHP, C#, JavaScript, Swift and more
    Programming Tutorials | Sitemap