PHP Sadness

The empty($v) function is actually (!isset($v) || $v != true)-and-silence-warnings

Update 2013-01-07: As of PHP 5.5, empty now works on expressions.

php empty is shorthand for is-falsy-but-silence-unset-warnings.

In PHP <5.5, it only works on single variables, not on expressions:

$ php -r 'error_reporting(-1); var_dump(empty($a || $b || $c && $d));'

Parse error: syntax error, unexpected T_BOOLEAN_OR, expecting ')' in Command line code on line 1

$ php -r 'error_reporting(-1); var_dump(!($a || $b || $c && $d));'

Notice: Undefined variable: a in Command line code on line 1

Notice: Undefined variable: b in Command line code on line 1

Notice: Undefined variable: c in Command line code on line 1
bool(true)

Because of this, to safely operate on variables that might not be set or to distinguish between an unset variable and one with the value NULL, you must check first check each with something like array_key_exists('varname', get_defined_vars()) individually rather than testing the whole expression with a single php empty. (Some languages circumvent this issue by requiring variable declaration or giving undefined variables explicit values.) Using php array_key_exists as a way to test for definedness produces this result:

// given:
unset($undef);
$null = NULL;
$false = FALSE;
$true = TRUE;

// these operations:
isset($var)
empty($var)
array_key_exists('var', get_defined_vars()) // "defined" below

// produce these results:
        isset   empty   defined
undef   -       X       -
null    -       X       X
false   X       X       X
true    X       -       X

In a situation where you have a function scope that could have possibly-undefined variables (like from a php include or php eval), you actually have to inspect the result of php get_defined_vars, being careful of superglobals and other things that could have crept into your scope. If you can contain the point of possible declaration, you might diff two calls to php get_defined_vars before/after that point to see what was added.

The name "empty" itself is especially confusing; php empty has nothing to do with testing whether a string is empty (use php strlen) or whether an object is empty (use (array)$object):

$ php -r 'error_reporting(-1); $a=(object)(array()); var_dump(empty($a));'
bool(false)

$ php -r 'error_reporting(-1); $a="0"; var_dump(empty($a));'
bool(true)

This leads to confused developers, enormous truth tables, and a very broad selection of articles about how to handle the situation.

The implementation of php empty is as follows:

The parser rule is in Zend/zend_language_parser.y:

internal_functions_in_yacc:
                T_ISSET '(' isset_variables ')' { $$ = $3; }
        |       T_EMPTY '(' variable ')'        { zend_do_isset_or_isempty(ZEND_ISEMPTY, &$$, &$3 TSRMLS_CC); }
        |       T_INCLUDE expr                  { zend_do_include_or_eval(ZEND_INCLUDE, &$$, &$2 TSRMLS_CC); }
        |       T_INCLUDE_ONCE expr     { zend_do_include_or_eval(ZEND_INCLUDE_ONCE, &$$, &$2 TSRMLS_CC); }
        |       T_EVAL '(' expr ')'     { zend_do_include_or_eval(ZEND_EVAL, &$$, &$3 TSRMLS_CC); }
        |       T_REQUIRE expr                  { zend_do_include_or_eval(ZEND_REQUIRE, &$$, &$2 TSRMLS_CC); }
        |       T_REQUIRE_ONCE expr             { zend_do_include_or_eval(ZEND_REQUIRE_ONCE, &$$, &$2 TSRMLS_CC); }
;

This matches only variable tokens and emits the result of zend_do_isset_or_isempty(ZEND_ISEMPTY, &$$, &$3 TSRMLS_CC) from Zend/zend_compile.c:

void zend_do_isset_or_isempty(int type, znode *result, znode *variable TSRMLS_DC)
{
        zend_op *last_op;

        zend_do_end_variable_parse(variable, BP_VAR_IS, 0 TSRMLS_CC);

        zend_check_writable_variable(variable);

        if (variable->op_type == IS_CV) {
                last_op = get_next_op(CG(active_op_array) TSRMLS_CC);
                last_op->opcode = ZEND_ISSET_ISEMPTY_VAR;
                SET_NODE(last_op->op1, variable);
                SET_UNUSED(last_op->op2);
                last_op->result.var = get_temporary_variable(CG(active_op_array));
                last_op->extended_value = ZEND_FETCH_LOCAL | ZEND_QUICK_SET;
        } else {
                last_op = &CG(active_op_array)->opcodes[get_next_op_number(CG(active_op_array))-1];

                switch (last_op->opcode) {
                        case ZEND_FETCH_IS:
                                last_op->opcode = ZEND_ISSET_ISEMPTY_VAR;
                                break;
                        case ZEND_FETCH_DIM_IS:
                                last_op->opcode = ZEND_ISSET_ISEMPTY_DIM_OBJ;
                                break;
                        case ZEND_FETCH_OBJ_IS:
                                last_op->opcode = ZEND_ISSET_ISEMPTY_PROP_OBJ;
                                break;
                }
        }
        last_op->result_type = IS_TMP_VAR;
        last_op->extended_value |= type;

        GET_NODE(result, last_op->result);
}

The ZEND_ISSET_ISEMPTY_VAR opcode is defined by ZEND_VM_HANDLER(114, ZEND_ISSET_ISEMPTY_VAR, CONST|TMP|VAR|CV, UNUSED|CONST|VAR) in Zend/zend_vm_def.h and, after defining isset, calculates the final result with this block:

if (!isset || !i_zend_is_true(*value)) {
        ZVAL_BOOL(&EX_T(opline->result.var).tmp_var, 1);
} else {
        ZVAL_BOOL(&EX_T(opline->result.var).tmp_var, 0);
}

The ZEND_ISSET_ISEMPTY_DIM_OBJ and ZEND_ISSET_ISEMPTY_PROP_OBJ dispatch zend_isset_isempty_dim_prop_obj_handler in Zend/zend_vm_def.h, which provides special cases for object properties, array elements, and string offsets.

Significance: Chaining

The ability to chain function calls, array lookups, property accesses, and so on is important to being able to write clean, concise code. Requiring the developer to needlessly create extra temporary variables merely to pass a value from one language construct to another encourages lengthy, messy code which is costly to change and difficult to follow.

Significance: Implications for Internals

The mere presence of this issue tends to imply some fatal flaw or unnecessary complexity at the most basic levels of the language. For example, an overly complex parser might be trying to compensate for missing functionality in the interpreter by incorrectly (and misleadingly) validating code at the syntax level, or messages without details could indicate that the internal design prohibits access to values where they should be reachable in a sane implementation.