Thursday, August 4, 2011

PHP copy on write - how PHP manages variable memory

I've been asked a similar question a few times by a few developers so i think it is better to write it down. Let's check the code

//assume we have a large size array
$largeArray = getLargeSizeArray();

function doTask(Array $large)
{
                //do the task
}
doTask($largeArray);

The question is like this: the argument is passed by value, which means a copy of $largeArray is made. This will take a lot more memories. Is it better to pass the argument by reference: function doTask(Array &$large)?

To this question, my answer is always 'No'. Well, to be honest, i simply don't want developers think passing by reference is a good practice even for the sake of memory. But the truth is, it actually depends on what 'the task' is inside the doTask() function. Most of the time, we can simply pass by value.

To get a solid understanding, we better dig deeper. Let's see how Zend Engine manage variables internally.

Zend is actually using a C struct, zval, to store the value of a variable:

typedef struct _zval_struct {
    zvalue_value value;
    zend_uint refcount;
    zend_uchar type;
    zend_uchar is_ref;
  } zval;

'zvalue_value value' is where the value of the variable is stored. zvalue_value is a union:

typedef union _zvalue_value {
    long lval;
    double dval;
    struct {
        char *val;
        int len;
    } str;
    HashTable *ht;
    zend_object_value obj;
} zvalue_value;

As you can guess, zval.type stores the variable type. Zend is using this zval.type and zval.value to make PHP a weak typing language, even though C is strong type language. Anyway, typing is not what we want to discuss here.

A simple code: $name = 'henry'; The question is, how Zend uses a zval to store '$name', or, how a zval knows that it is storing a value for '$name'? In zval, we can't find any field to store the '$name'. The answer is, PHP stores the name of a variable in a hash table, called symbol_table. And there is a mapping mechanism from the variable name to the variable value(zval).

Now Let's check the PHP code:

$name  = 'henry';
$fname = $name;
unset($name);

The first line, PHP allocates a 6 bytes of memory to store 'henry'(5 bytes) and \0 (1 byte), which is NULL.
The second line, a new variable $fname is created, and the value of $name is "copied" to $fname.
The third line, unset $name trying to free the memory taken by $name.

This kind of code is quite common. If PHP allocates a new memory for every new variable assignment, then for this example, PHP must give 12 bytes of memory for $name and $fname. We know we don't really need that much memory. We can simply make symbol_table's $fname refers to the same zval that $name is referring to. And that is exactly how Zend Engine does. Humm, that sounds like we are not really copying, we are referring. So what will happen when we do unset($name)? How does PHP knows there is a $fname referring to $name?

Time to have a look at "zend_uint refcount". Let's try this:

$name='henry';
xdebug_debug_zval('name');

The output is "name: (refcount=1, is_ref=0)='henry'"

And we can see that zval.refcount=1, which means there is one variable referring to this zval. Now do this:

$fname = $name;
xdebug_debug_zval('fname');

The output is "fname: (refcount=2, is_ref=0)='henry'"! Strange? Shouldn't be refcount=1? Let's try:

xdebug_debug_zval('name');

We get the same output: "name: (refcount=2, is_ref=0)='henry'"!

So, actually, $fname and $name are referring to the same zval. By changing the value of zval.refcount, PHP knows that there are two variables referring to the same zval. If we assign $name to more other new variables, PHP will simply increase the value of zval.refcount and it will NOT allocate more memories. Ok, what happens if we unset($name)? You can guess! Right, PHP simply descrease the value of zval.refcount. Let's do this:

unset($name);
xdebug_debug_zval('fname');

The output is "fname: (refcount=1, is_ref=0)='henry'". You know what? In this situation(two variables referring to same zval), using unset cannot release/free the memory.

Now, try this:

$name  = 'henry';
$fname = $name;
$name  = 'li';

Obvious, $fname is still 'henry'. But if $fname is referring to the same zval, its value should change to 'li' too, right? Well, PHP has a copy on write mechanism: When PHP is going to change a variable, it will check its zval.refcount first. If zval.refcount > 1, PHP will create a new zval, descrease the old zval.refcount by 1, and modify the symbol_table so that $fname and $name is referring to different zval. So, at this time, PHP must allocate new memory. And also at this time, if we unset($name), we can really save some memory.

Now we know that when PHP is doing pass by value, or copying a variable to another, it is not "really copying". It makes them referring to the same zval to save memory. So back to the question at the beginning:

//assume we have a large size array
$largeArray = getLargeSizeArray();

function doTask(Array $large)
{
                //do the task
}
doTask($largeArray);

Simply passing the $largeArray by value into doTask function will not cost more memory. But, i also say it really depends on what we do inside the doTask() function. If we need to change the value of the argument, then PHP has to spend more memory. Like this:

//assume we have a large size array
$largeArray = getLargeSizeArray();

function doTask(Array $large)
{
                $large[0] = 'xxxx';
}
doTask($largeArray);

We change the value of the argument and PHP has to create a new zval to save it.

Alright, finally, just simply mention it here: what is "zend_uchar is_ref"? I think you can easily guess now:

$name  = 'henry';
$fname = &$name;
xdebug_debug_zval('name');

The output is "name: (refcount=2, is_ref=1)='henry'". Don't have to explain more, right?

3 comments:

PHP jaipur said...

Nice and great post this is very useful for us keep blogging. and thanks for share information with us.
php string variables

Mukand Agarwal said...

how does it handle the case when we add a another varable $fsname = $name. Now , I am changing $fsname= "x"; How does it keep track which variable is assigned by referenced and which by value.

Anonymous said...

really superb explanation!!!