Thursday, June 30, 2011

split a large file into small files

If we want to split a large file into several smaller files, we can use split command (on linux).

split --bytes=SIZE largeFile smallFilePrefix

For example, i have a file a little larger than 50M(well, maybe that is not really large). If i want to split it into several small 1M files, i can do:

split --bytes=1M largeFile smallFile

And the you will find a lot of smallFile* files was generated. Let 's count them:

ls smallFile* | wc -l

I got probably 53 of them.

If we want to restore the original file, we can do

cat smallFile* > originalFile

using sed to replace spaces

I want to replace multiple white spaces(but still keep tabs) with one space only in a file. Here is how we can do it:

sed 's/ \+/ /g' targetFile > newFile

install xdebug on windows

1. Download xdebug windows binary file from http://xdebug.org/download.php. If you are using Apache, make sure you download the "PHP X.X XXX TS(XX XXX)" one. TS here means thread safe. If you are using IIS, then download NTS version.

2. After download the .dll file, put it in the php/ext folder and rename it to php_xdebug.dll

3. Edit the php.ini file, add these lines:

extension=php_xdebug.dll

[Xdebug]
xdebug.profiler_enable=on
xdebug.trace_output_dir="d:\xdebug"
xdebug.profiler_output_dir="d:\xdebug"
xdebug.dump.GET=*
xdebug.show_local_vars=1

"d:\xdebug" is just the folder where you want to save xdebug's output.

Restart the web server and run phpinfo(). If you find xdebug section, then the installation succeeds.

Tuesday, June 28, 2011

javascript tricky questions, their answers & explanation

Questions below are just tricky questions. I try to answer them simply for fun. 

1.
(function () { 
    return typeof arguments; 
})(); 

A. "object"
B. "array"
C. "arguments"
D. "undefined"

Answer: B
Reason: Javascript function is actually object so they have properties. arguments is a property of javascript function. It consists of an array of all the arguments passed to a function. Although arguments property is an array, typeof arguments  will simply tell 'object'. Although this information is not that useful, it is correct, because javascript array is also object.

2.
var f = function g() {
        return 23;
    };
typeof g();

A. "number"
B. "undefined"
C. "function"
D. Error

Answer: D
Reason: This is a named function expression. 'g' is the function name. The important part is this name can be used by the code inside the function(g()) to refer to the function itself, but the code outside the function cannot see it at all.

3.
(function (x) {
    delete x;
    return x;
})(1);

A. 1
B. null
C. undefined
D. Error

Answer: A
Reason: note that delete can only delete object properties. You can't use it to delete variables, no matter the variables are functions or objects.

4.
(function f(f) {
    return typeof f();
})(function () {
    return 1;
});

A. "number"
B. "undefined"
C. "function"
D. Error

Answer: A
Reason: Not hard. It is just the code not easy to read. 

5.
var foo = {
    bar: function () {
        return this.baz;
    },
    baz: 1
};
(function () {
    return typeof arguments[0]();
})(foo.bar);

A. "undefined"
B. "object"
C. "number"
D. "function"

Answer:A
Reason: Don't be fooled by picking C. This is a typicl callback scope issue: arguments[0]() is a callback just like you try to call foo.bar(); However, inside foo.bar, there is a keyword 'this'. We must remember that the callback method uses 'this' to refer to the object it belongs to. So, where is 'this' pointing to? We must note the way we do callback here: arguments[0](). 'this' is actully referring to the object 'arguments'! If we modify the code a little bit:
(function () {
    arguments.baz = 3;
    return typeof arguments[0]();
})(foo.bar);

Well this time we will get 'number'. I made a post about javascript callback function scope: http://hengrui-li.blogspot.com/2011/05/javascript-callback-function-scope.html

6.
var f = (function f() {
    return "1";
}, function g() {
    return 2;
})();
typeof f;

A. "string"
B. "number"
C. "function"
D. "undefined"

Answer: B
Reason: immediate function. f === 2.

7.
var x = 1;
if (function f() {}) {
    x += typeof f;
}
x;

A. 1
B. "1function"
C. "1undefined"
D. NaN

Answer:C
Reason:This is exactly like:
var x = 1;
if (g=function f(){}) {
    x += typeof f;
}
x;

8. var x = [typeof x, typeof y][1];
typeof typeof x;

A. "number"
B. "string"
C. "undefined"
D. "object"

Answer: B
Reason: Actually, you don't really have to figure out what x really is. No matter what it is, typeof x must be a string indicating what x type is. It is either 'undefined' ,'string', 'object' or 'number', typeof typeof x must be string then.

9.
(function (foo) {
    return typeof foo.bar;
})({
    foo: {
        bar: 1
    }
});

A "undefined"
B "object" 
C "number"
D Error

Answer: A
Reason: another wholy tricky question. in return typeof foo.bar, foo is actally the object {foo:{bar:1}}.

10.
(function f() {
    function f() {
        return 1;
    }
    return f();

    function f() {
        return 2;
    }
})();

A.1 
B.2 
C.Error (e.g. "Too much recursion") 
D.undefined

Answer:B
Reason:function hoisting.

function f() {
    return f;
}
new f() instanceof f;

A.true 
B.false

Answer: B
Reason: simply remember in javascript, everything is a singleton. More about javascript singleton: http://hengrui-li.blogspot.com/2011/05/javascript-design-patterns-singleton.html

Sunday, June 26, 2011

Cool PHP Documentation update

1. pman - php man pages
Linux users know that 'man' command is very handy. If you want to learn the usage of 'ls' command, you can simply type man ls.
pman, is php man pages. To install:

sudo pear channel-update doc.php.net
sudo pear install doc.php.net/pman

Ater install ok, you can simply use 'man <php cmd/function>'. For example, if you want to learn PHP's empty function, you can type: man empty

2. Enhanced CHM PHP manual. (Download: http://www.php.net/download-docs.php)
I know we can get help information from internet anytime anywhere now. But i always like to keep a copy of offline manual. There are several benefits of keeping an offline document: 
* you can access it even the internet connection is lost. 
* you can read and learn it just like reading a book. So you won't be limited to a specific topic. You can browse the big picture.
* It is fast, easy to browse and search.

PHP manual, however, did NOT contain those user notes, which contain very valuable information sometimes. This enchanced PHP manual addresses this issue. The manual contains the user notes. This make the offline manual much better and more valuable.

This is the online document editor. You can login as anonymous user. I didn't spend much time on this. The only interesting part to me is, well, this online document editor is obviously powered by ExtJS, which i was working on recently.

Thursday, June 23, 2011

mysql replace text in field

Task: Assume that we have a table called articles. This table has a column/field called title. We have a lot of titles like "ubuntu administration tips xxx". One day we found that all these tips actually apply to all linxu systems, so we want to change all the titles from "ubuntu administration tips xxx" to "linux administration tips xxx"

Here is how we can do it quickly:

UPDATE articles SET title = REPLACE(title, 'ubuntu', 'linux');

Monday, June 20, 2011

PHP $_REQUEST: why or why not

Many PHP developers hold this as a rule: avoid using $_REQUEST and always use $_POST or $_GET. If you ask why, the answer may be 'it is not secure'. Why using $_REQUEST is not secure? Well, i believe you can NOT find a strong proof. You may bring up Cross-Site Request Forgeries (CSRF), but that is not $_REQUEST's fault. Even you use $_GET or $_POST, you still may suffer CSRF, easily.

Another reason might be that in your system, you have this very bad practice: http://hengrui-li.blogspot.com/2011/06/php-get-post-precedence.html, which you should not.

A more decent reason is, prior to PHP5.3, COOKIE data is also populated into $_REQUEST, and takes the higher priority than GET and POST(Open php.ini, you can find variables_order = "GPCS", read the document and you will know more). In PHP5.3, this issue has gone. COOKIE data will not get populated in $_REQUEST by default. Prior to PHP5.3, you can change php.ini if you have control on your server. If you don't? Well, talk about this situation later.

So, what is the benefit of using $_REQUEST? The truth is, in most cases, we don't care if the data is submitted via GET or POST. By using $_REQUEST, we gain so much flexibility on both client side and server side. For example, if we are developing an API, we define this API accepts several parameters, and return the result in certain format. We don't really want to restrict the users that they can only call this API via POST. That is all up to the user. As long as a user submits the correct data, the API should work as expected. It also helps in debugging and testing by allowing both POST and GET. We can easily simulate an API request with test data.

Ok, back to the question. What if we are in PHP5.2 and cannot change server settings and we still want to gain the flexibility? Well, simply do this: $params = array_merge($_GET, $_POST);

Sunday, June 19, 2011

PHP GET & POST precedence

First of all, it is a very bad practice that a URL contains a parameter which has the same name with a parameter in a POST form. For example:

<form method="POST" action="submit.php?action=doThis">
...
...
<input type="hidden" name="action" value="doThat">
</form>

You shall always avoid this.

But here, we are not talking about good practice in development. We are just looking into this issue form the technology point of view.

So, in case the above situation happened, what would be the value of "action"?

If we var_dump($_GET['action']), we can find the value is 'doThis'. If we var_dump($_POST['action']), the value is 'doThat'.

So far so good, no confusion. But what if we are using $_REQUEST['action']? It turns out that if we var_dump($_REQUEST['action']), we will get 'doThat'! The POST one takes the precedence.

In PHP, By default, POST has higher priority than GET. We can change that in php.ini if we want. Take PHP5.3.3's php.ini as an example. We can find a directive request_order = "GP". The document states "This directive determines which super global data (G,P,C,E & S) should be registered into the super global array REQUEST. If so, it also determines the order in which that data is registered".

G = GET, P = POST, C = COOKIE, E=ENV, S = SERVER.

So request_order = "GP" means GET data will be registered into the $_REQUEST array first, and then POST. So POST data will override the GET data in $_REQUEST.

You may also want to have a look at variables_order = "GPCS". Just check it through in your php.ini.

So how to avoid this potential confusion? For me, the only correct and clean solution is don't ever try this bad practice in your development. Some developers may bring another suggestion: don't use $_REQUEST. Well, personally, i don't completely agree with this suggestion, but i think i should talk about this issue in another post. 

Saturday, June 18, 2011

linux rename multiple files

Did you ever have this situation that you have multiple files like a.inc, b.inc, c.inc, and you want to change them to a.php, b.php, c.php? Basically, we want to change extensions of multiple files in linux - bulk rename files.

rename command can get this task done.

rename [ -v ] [ -n ] [ -f ] perlexpr [ files ]

-v Verbose: print names of files successfully renamed
-n No Action: show what files would have been renamed
-f Force: overwrite existing files

perlexpr: Perl expression.  Well, this might be the headache to most users.

files: specify the pattern of the files that need to be renamed.

So, if we want to rename all files matching *.inc to *.php, we can do:

rename -v 's/.inc/.php/' *.inc

In this command, s means substitute. We want to substitute *.inc with *.php. The last *.inc means do the replacement to all files matching *.inc

Thursday, June 16, 2011

GET or POST method

A very old topic. But always remember the guiding rules:

1. No matter how many times a http request is submitted anew, if the http request will not result in a state-changing action, use GET.
For example, searching repeatedly will not change the database's contents(Well, you may want to record the statistic of keywords being searched which may cause change in database, but when i say the db's content won't get changed, you know what i mean: the request should not have detrimental effect if submitted repeatedly), so the search form could be a good candidate for GET method.

2. If a http request will cause state-changing action, use POST. For instance, charging a credit card.

Now let's explain why these rules. First of all, we have to know the distinction between GET and POST.

The first and very obvious difference is using GET, the data is submitted via the URL. And you may also know that the maximum URL length depends on the browser.

Another distinction is if a request had been submitted using POST method, and the user tries to reload the page, the browser will warn the user if he wants to submit the data again so it can prevent the request from being executed again accidently.

Now, let's assume we just submit our credit card data and get charged. If the method is GET, and we accidently reload the page, the browser will not warn us anything and we may get charged twice. So we better use POST in this case.

Anyway, they are just guiding rules. It doesn't mean you have to follow every time. Come to the end, it all depends on the requirements. For example, if you are developing a bunch of APIs and decide that all requests must be submitted using POST method, then just do it.

Wednesday, June 15, 2011

javascript prefer literal notation than constructor function

In javascript, we can create an object using object literal:
var obj = {};

We can also create an object using constructor function:
var obj = new Object();

I think most people from other OOP background will feel more comfortable with using constructor function. That was exactly what i did until i learned that it is not encouraged in javascript.

So, why do we say we should favor literal notation over constructor function?
1. literal notation is obviously shorter to type. Although for me, that is not a big deal. If constructor function can make things easier to understand and help reduce confusion, i don't mind typing a bit more.
2. It emphasizes that javascript objects are simply mutable hashes and not something that needs to be baked from a “recipe” (from a class). Well, this reason sounds more decent to me.
3. the Object() constructor accepts a parameter and, depending on the value passed, it may decide to delegate the object creation to another built-in constructor and return a different object than you expect.


// an empty object
var o = new Object();
console.log(o.constructor === Object); // true

// a string object
var o = new Object("I am a string");
console.log(o.constructor === String); // true

// normal objects don't have a substring() method,
// but string objects do
console.log(typeof o.substring); // "function"


This behavior of the Object() constructor can lead to unexpected results when the value you pass to it is dynamic and not known until runtime.

Let's have a look at Array literal. To declare an array, we can simply do:
var a = []
or
var a = new Array();

There is one obvious reason we should definitely avoid using Array constructor. Let's check through it in details. Say we create arrays with elements 1,2,3
//using literal

var a = [1,2,3];
//using constructor
var b = new Array(1,2,3);
//so far so good
for(var i=0; i<3; i++){
    console.log(a[i],b[i]);
}

Everything looks fine. Now, we simply want to create an array with element 1;
//using literal
var a = [1];

//using constructor
var b = new Array(1);
//now see the funny part
console.log(a[0]) //no problem, we get 1
console.log(b[0])//big problem! we get 'undefined'!!!

How come this happen? It turns out that when we pass a single number to the Array() constructor, it doesn’t become the value of the first array element. It sets the length of the array instead! So, new Array(1) creates an array with length of 1, but no actual elements! How awful it is!

What if we do var b = new Array(3.14)? Well, you will simply get "RangeError: invalid array length"

Thursday, June 9, 2011

a book about server side javascript

More specifically, it is a book about Node.js: "Node: Up and Running: Scalable Server-Side Code with JavaScrip".

The book will be available at September 22, 2011 and I'm really looking forward to it.

is server side javascript going to replace PHP?

Today, i read this news, it is from twitter: "Douglas Crockford says Yahoo is the biggest PHP factory in the world, and is looking to replace it with Node.js (Server-Side JavaScript)".

Wow, if that is true, that is really exciting. I don't want to compare javascript with PHP from language point of view. I just simply believe that if we can use one language on both frontend and backend, the world of web will become better. I express this feeling in one of my blog: http://hengrui-li.blogspot.com/2011/05/server-side-javascript-when-will-you.html

Although i don't think things can go quickly and easily, i would like to believe this can come true one day. Sometimes, what you need is just a simple belief and hold it.

Wednesday, June 8, 2011

javascript loop performance

javascript has several types of loops

1. for loop
Probably it is the most usual loop in javascript:
for (var i=0; i<items.length; i++) {
    handle(items[i]);
}

2. while loop
var i=0;
while(i < items.length) {
    handle(items[i]);
    i++;
}

3. do while loop
this loop ensures the loop body gets executed at least once.
var i=0;
do {
   handle(items[i]);
   i++;
} while (i < 10);

The above three loop types are quite common in other language as well. Here is another javascript specific loop:

4.for-in loop
for (var i in items) {
    handle(items[i]);
this loop enumerates the named properties of any object.

Performance:

Only one loop is significantly slower than the others: for-in loop. It is slower due to the fact that each iteration results in a property lookup either on the instance or a prototype. Each time the loop is executed, the 'i' is filled with the name of another property that exists on the object 'items' until all properties have been returned. The returned properties are either on the object instance or inherited through the prototype chain. So, unless you have to iterate over object properties, for-in loop should not be used.

Except for-in loops, other loop types are so close in performance that it doesn't worth trying to determine which is faster. The choice should be based on your requirement instead of performance.

So if loop type doesn't matter in performance, what does? Two issues:
1. Tasks that must be done in loop, the obvious one is the body of the loop: handle(items[i])
2. Number of iterations, very obviously.

When it comes to issue 1, it is not as that simple as it looks. Let's check through loop type 1 in details.

for (var i=0; i<items.length; i++) {handle(items[i]);}

1. The loop first initiate a variable i=0. Fortunately, this task only has to be done once.
2. items.length is a property lookup that has to be done every time in the loop
3. i < items.length is a comparison that has to be done every time
4. (i < items.length) == true is another comparison every time
5. one increment operation i++ every time
6. array lookup every time: items[i]
7. handle(items[i]) every time.

Such a simple loop actually contains a lot of operations. The performance of this loop largely depends on handle(items[i]) function. But, reducing the total number of operations can still help in performance.

1. we don't want to do property lookup, items.length, every time, due to the fact that it is quite unlikely that the length could be changed during the loop. So, we can just do it once:

for (var i=0, length=items.length; i<length; i++){handle(items[i]);}

2. i < items.length, (i < items.length) == true are comparisons every time. We can try to reduce the comparisons by reversing the loop order. Usually, the order in which item is handled is irrelevant to the task. So, we can start the loop from the last item to the first item:

for (var i=items.length; i--;){handle(items[i]);}

In this way, we now simply compare 'i' against zero. Control conditions are compared against the value true, and any nonzero number is treated as true automatically, while zero is equivalent of false. So two comparisons(Is i less than the total items.length? Is that equals to true?) has been reduced to one comparison(Is the value true)

Finally, function based iteration, forEach. It is introduced in 4th editon of ECMA-262. 
items.forEach(function(value, index, itemsArray){handle(value);});

You may already see some implemetations in popular js libraries. One example is Prototype's items.each().

Comparing function based iteration with basic loop types in performance is not that fair, at least that is what i think. Because in some situations function based iteration is very handy and convenient. But if we have to compare, then, as you may already expect, function based iteration is quite slower than basic loop types, due to the overhead that an extra method has to be called every time.

Saturday, June 4, 2011

design patterns - null object pattern

As we know, if conditional statement is one thing that hurts the program readability. NULL object is also one approach to reduce your application's if condition statements. We are all familiar with these kind of code:

$user = $userDao->getUserById($id);
//For objects, i personally like to use instanceof for type checking
//but more usually, other programmers simply check $user !== null.
if ($user instanceof UserObject) {
       //do some logic with $user here
       echo $user->getName();
} else {
       //do some other things if $user is null
      echo 'user not exists';
}


As we can see, we usually use if conditional statement to check if $user is null or not, and then behaves differently based on the checking result. This kind of if statement checking could be all over the whole application. NULL object is here to help, for simplicity, i will ignore the interface part, let's check the code:

class User implements iUser
{
       private $_name;
       public function getName() {
               return $this->_name;
       }
}

class NullUser implements iUser
{
       public function getName() {
               return 'not exists';
       }
} class UserDao {         public function getUserById($id) {                 //query the database or do what ever to get the user                 $user = $this->find($id);                 if ($user === null) {                        return new NullUser();                 }                 return $user;         } } //Now your domain logic will become quite straight forward and more readable: $user = $userDao->getUserById($id);
echo $user->getName();


Now you can see how NULL object pattern can help simplify your business logic, however, you must be very careful with using this pattern. Although it simplifies your domain logic and increases readability, it does complicate your design and maintenance. And if a programmer is not aware of a NULL object implementation, he may redundant the null test, or he may really want to do some different thing if the object is null. Null object may also bring you issues when it comes to CURD operation, for example:
if ($user !== null) {
    //db query:
    $posts= $articleDao->getUserPosts($user);
}
//if we use null object, we will do the db query anyway,
//which might not be necessary
$posts= $articleDao->getUserPosts($user);

Personally, i totally agree that we should try to reduce if conditional statement in our application, particularly in business domain. But when it comes to checking if an object is null or not, i feel that i can accept that, because usually, the logic is not complex when an object is null, or even there is no logic when an object is null, for example:

$user = $userDao->getUserById($id);
if ($user instanceof UserObject) {
       //do some logic with $user here
       echo $user->getName();
//that is it, we don't have to do anything when $user is null




Thursday, June 2, 2011

Design Patterns - Strategy

Strategy Pattern Definition: the Strategy Pattern defines a family of algorithms,encapsulates each one,and makes them interchangeable. Strategy lets the algorithm vary independently from clients that use it.

I know a lot of websites are quite simple and some web developers only know CURD and once the domain logic becomes complex, they start to struggle and start to create crap. But, anyway, complex conditional logic is one thing that makes program hard to read & understand. To improve readability, We can replace conditional logic with strategy pattern + factory pattern.

For example, we have logic below:
//some how we get a format value dynamically
$format  = $objOne->getFormat();
$context = $objTwo->getContext();

if ($format === 'xml') {
    //start to format $context into xml format
} elseif ($format === 'txt') {
    //format text
} elseif ($format === $somethingElse && $someotherConditionCheck) {
    //another format implementation
}

Let's see how to use Strategy and Factory Pattern to improve the code.

From the definition, we know that we should define an interface for a set of algorithms, for example:

interface FormatStrategy
{
    public function format($content);
}

Now, we can implement our algorithms. Each one is a separate class that implements FormatStrategy Interface:
class XmlFormat implements FormatStrategy
{
    public function format($content)
    {
        //do some xml format
        return $content . ' in xml';
    }
}

class TxtFormat implements FormatStrategy
{
    public function format($content)
    {
        //do some txt format
        return $content . ' in text';
    }
}
We defined two for algorithms and they both implement the same interface. Next, we must define some kind of strategy selector(Factory):

class FormatStrategySelector
{
    public static function getStrategy($strategy) 
    {
        switch($strategy) 
        {
            case 'xml':
                return new HtmlFormat();
                break;
            default:
                return new TxtFormat();
                break;
        } 
    }
}

To use strategy:

$format  = $objOne->getFormat();
$context = $objTwo->getContext();

$formattedContent = FormatStrategySelector::getStrategy($format)->format($context);