Patching exuberant-ctags for better PHP5 support in vim

by Sander Marechal

Thanks to the taglist vim plugin, vim users have access to a decent tag browser. A tag browser allows you to view a list of all variables, functions and classes and quickly jump to their definitions. Taglist is built on exuberant ctags so it support a large amount of languages. Unfortunately however, when the exuberant ctags people replaced their old PHP lexer with a brand new regexp-based parser the quality of parsing PHP code decreased dramatically. Ctags suddenly could not distinguish real class and function declarations from mere mentions of the words “class” and “function” in multi-line comments. This is because the ctags regular expression parses is inherently line oriented.

In this article I have two patches that greatly improve PHP support in exuberant-ctags. I will also show you how you can apply these patches on a Debian-based system.

Update 2009-07-07: David Mudrak has written an updated patch for Gentoo. I have ported his improvements back to my patch here. It fixed a problem with old-style functions without a visibility declaration and it applied the same trick to interfaces as well.

Properly handle multi-line comments

The exuberant ctags regular expression parser works on a line-by-line basis. The problem with this is that it cannot detect the start and end of a multi-line comment. Any text in the comment that contains the word “class” or “function” will make ctags think that it is a declaration instead of simply a comment.

Short of writing a full PHP lexer this cannot be easily fixed, but the patch below works around it by making a few assumptions. Almost all people write PHP code in such a way that a function or class declaration starts at a new line. Most people use docblock-style multi-line comments in their code. By simply looking at what is in front of a “class” or “function” you can predict with 99% certainty if it is a declaration or not. There should be only whitespace or keywords such as “public”, “static” or “abstract” in front of a class or function statement.

Download php-multiline-comment-v2.patch

  1. --- exuberant-ctags-5.7/php.c   2007-06-24 21:57:09.000000000 +0200
  2. +++ exuberant-ctags-5.7-multiline/php.c 2009-07-07 17:34:27.000000000 +0200
  3. @@ -11,6 +11,8 @@
  4.  *   variables.
  5.  *
  6.  *   Parsing PHP defines by Pavel Hlousek <pavel.hlousek@seznam.cz>, Apr 2003.
  7. +*   Multiline comment fixes by David Mudrak <david.mudrak@gmail.com>, Jul 2009
  8. +*      based on a patches by Sander Marechal <s.marechal@jejik.com>, Nov 2008.
  9.  */
  10.  
  11.  /*
  12. @@ -64,14 +66,14 @@
  13.  
  14.  static void installPHPRegex (const langType language)
  15.  {
  16. -       addTagRegex(language, "(^|[ \t])class[ \t]+([" ALPHA "_][" ALNUM "_]*)",
  17. -               "\\2", "c,class,classes", NULL);
  18. -       addTagRegex(language, "(^|[ \t])interface[ \t]+([" ALPHA "_][" ALNUM "_]*)",
  19. +       addTagRegex(language, "(^[ \t]*)(abstract[ \t]+)?class[ \t]+([" ALPHA "_][" ALNUM "_]*)",
  20. +               "\\3", "c,class,classes", NULL);
  21. +       addTagRegex(language, "(^[ \t]*)interface[ \t]+([" ALPHA "_][" ALNUM "_]*)",
  22.                 "\\2", "i,interface,interfaces", NULL);
  23. -       addTagRegex(language, "(^|[ \t])define[ \t]*\\([ \t]*['\"]?([" ALPHA "_][" ALNUM "_]*)",
  24. +       addTagRegex(language, "(^[ \t]*)define[ \t]*\\([ \t]*['\"]?([" ALPHA "_][" ALNUM "_]*)",
  25.                 "\\2", "d,define,constant definitions", NULL);
  26. -       addTagRegex(language, "(^|[ \t])function[ \t]+&?[ \t]*([" ALPHA "_][" ALNUM "_]*)",
  27. -               "\\2", "f,function,functions", NULL);
  28. +       addTagRegex(language, "(^[ \t]*)(public[ \t]+|protected[ \t]+|private[ \t]+)?(static[ \t]+)?function[ \t]+&?[ \t]*([" ALPHA "_][" ALNUM "_]*)",
  29. +               "\\4", "f,function,functions", NULL);
  30.         addTagRegex(language, "(^|[ \t])\\$([" ALPHA "_][" ALNUM "_]*)[ \t]*=",
  31.                 "\\2", "v,variable,variables", NULL);
  32.  
  33. @@ -87,7 +89,7 @@
  34.  /* Create parser definition structure */
  35.  extern parserDefinition* PhpParser (void)
  36.  {
  37. -       static const char *const extensions [] = { "php", "php3", "phtml", NULL };
  38. +       static const char *const extensions [] = { "php", "php3", "php4", "php5", "phtml", NULL };
  39.         parserDefinition* def = parserNew ("PHP");
  40.         def->extensions = extensions;
  41.         def->initialize = installPHPRegex;

I have already submitted this patch to the upstream exuberant-ctags project.

Only show class member variables

With the above patch ctags will correctly tag functions and classes, but variables are still not handled the way I like. The standard implementation tags all variables, including temporary variables and local variables. I prefer that ctags only tags class variables. Approximately the same logic as above is followed. A variable is a class variable if it is preceded by a keyword such as var, static, public, etcetera.

Download php-member-variables-only-v2.patch

  1. --- exuberant-ctags-5.7-multiline/php.c 2009-07-07 17:34:27.000000000 +0200
  2. +++ exuberant-ctags-5.7-members/php.c   2009-07-07 17:37:29.000000000 +0200
  3. @@ -74,8 +74,8 @@
  4.                 "\\2", "d,define,constant definitions", NULL);
  5.         addTagRegex(language, "(^[ \t]*)(public[ \t]+|protected[ \t]+|private[ \t]+)?(static[ \t]+)?function[ \t]+&?[ \t]*([" ALPHA "_][" ALNUM "_]*)",
  6.                 "\\4", "f,function,functions", NULL);
  7. -       addTagRegex(language, "(^|[ \t])\\$([" ALPHA "_][" ALNUM "_]*)[ \t]*=",
  8. -               "\\2", "v,variable,variables", NULL);
  9. +       addTagRegex(language, "(^[ \t]*)(var[ \t]+|public[ \t]+|protected[ \t]+|private[ \t]+)?(static[ \t]+)?\\$([" ALPHA "_][" ALNUM "_]*)[ \t]*=",
  10. +               "\\4", "v,variable,variables", NULL);
  11.  
  12.         /* function regex is covered by PHP regex */
  13.         addTagRegex (language, "(^|[ \t])([A-Za-z0-9_]+)[ \t]*[=:][ \t]*function[ \t]*\\(",

Patching exuberant-ctags on Debian

On Debian and derivative it is quite easy to build your own custom packages. Lets start off by installing everything we need to build exuberant-ctags ourselves.

  1. ~$ sudo aptitude build-dep exuberant-ctags

Next we create a working directory and download the source package.

  1. ~$ mkdir ctags
  2. ~$ cd ctags
  3. ~/ctags$ aptitude source exuberant-ctags

I am assuming that you have downloaded the two patch files to your desktop. Apply them like this.

  1. ~/ctags$ cd exuberant-ctags-5.7
  2. ~/ctags/exuberant-ctags-5.7$ patch -p1 < ~/Desktop/php-multiline-comment.patch
  3. ~/ctags/exuberant-ctags-5.7$ patch -p1 < ~/Desktop/php-member-variables-only.patch

Now you need to increase the version number of the package so that you can later install it. You can use the dch to do this. It will read the changelog, add your name to it and open your preferred editor to edit the changelog.

  1. ~/ctags/exuberant-ctags-5.7$ dch

I edited my changelog to look like this. Notice the “+multiline” that I added to the version number at the top.

  1. exuberant-ctags (1:5.7-4+multiline) unstable; urgency=low
  2.  
  3.   [ Colin Watson ]
  4.   * Update DEB_BUILD_OPTIONS parsing code from policy 3.8.0.
  5.   * Backport from upstream (closes: #484797, SF #1878155):
  6.     - jscript.c was not properly handling escaped quotes.
  7.   * Add a Homepage control field.
  8.   * Refer to /usr/share/common-licenses/GPL-2 in debian/copyright rather
  9.     than plain GPL.
  10.   * Policy version 3.8.0.
  11.  
  12.   [ Sander Marechal ]
  13.   * Try to avoid matching multiline PHP comments
  14.  
  15.  -- Sander Marechal <s.marechal@jejik.com>  Tue, 25 Nov 2008 14:09:00 +0100

Now we can build the package. The -b flag tells debuild that we only want to build a binary package. The -i flag runs some extra checks after building. The -us and -uc tell debuild that you do not wish to sign the package with your GPG key.

  1. ~/ctags/exuberant-ctags-5.7$ debuild -i -us -uc -b

Exuberant-ctags should build without errors. Now you can install the package.

  1. ~/ctags/exuberant-ctags-5.7$ cd ..
  2. ~/ctags$ sudo dpkg -i exuberant-ctags_5.7-4+multiline_i386.deb

The last thing to do is to tell your package manager that you want to keep exuberant-ctags at the version you just installed and not install an updated version.

  1. echo 'exuberant-ctags hold' | dpkg --set-selections

And that’s it! Now you should have much better support for taglist in vim with PHP. See the screenshot below for an example. Notice how there is no function called “in” and a variable called “$foo” in the laglist.

Creative Commons Attribution-ShareAlike

Comments

#1 Rys (http://pixeltards.com/)

Hi Sander,

Thanks for the patches, they've definitely improved my PHP dev experience under vim. Would you also be willing to share your vim colorscheme? It looks great, very easy on the eye.

Thanks!

#2 Sander Marechal (http://www.jejik.com)

Hi Rys,

I shared my vim colorschemes (and other plugins) in a previous article already. It's the "Wombat" colorscheme. See my article “My list of must-have vim scrips”.

#3 David Mudrak

Thanks for the patches. Those using gentoo may look at
http://bugs.gentoo.org/show_bug.cgi?id=276902

#4 Sander Marechal (http://www.jejik.com)

Thanks David. I have updated the article and my patches with your improvements.

#5 Bryan Gruneberg (http://www.perceptum.biz)

Hi

Thanks for this!

The multiline comment problem poped up in the ctags-5.8 MacPorts distro as well. Unfortunately the patch didnt apply cleanly to the 5.8 code base, so I merged the changes and created a new patch file for the ctags-5.8 MacPorts distribution.

Ive posted it on my blog @ http://www.drupaler.co.za/macports-patch-ctags-58-phpc if anyone needs it.

It seems to work ok, but Im not sure if I have broken anything in the process.

Thanks again for the post!

Bryan
Post a new comment

Registration is not required to post comments, but cookies must be enabled. One of the advantages of registration is that you can edit your comments later on (editing not yet implemented). You can register or login here.




Your e-mail address will not be published, but your website URL will. All links that you post will tagged rel="nofollow" to throw off spammers. You are allowed to use the following XHTML tags in your comment: <em> <strong> <u> <b> <i> <strike> <blockquote> <big> <small> <ul> <ol> <li> <a href=""> <pre> <code> <tt> <br>. Please allow up to 60 second processing time after you post a comment. Our spam filters may take some time.