Omnigia

May 19, 2008

Debian / Ubuntu packaging: Zorba XQuery

Filed under: linux, c, debian, xquery, zorba — Dan Muresan @ 1:57 pm

Today I uploaded Ubuntu source and binary (Gutsy and Hardy) packages for Zorba, the new C++ streaming XQuery processor. The Ubuntu PPA system (Personal Package Archives) is a great service; without it, you’d need to host an APT repository in order to conveniently distribute packages that are not (yet) part of Debian or Ubuntu (especially since a Debian source package is actually three files).

In fact, my source package works in Debian unstable too; as there is no custom Debian Sid APT repository (Ubuntu PPA only serves Ubuntu distros), here’s what you need to do to build and install it:

  • dget the .dsc file (which pulls the original tarball and a .diff.gz as well)
  • run pbuilder zorbaxquery_0.9.1-3.dsc (apt-get install and set up pbuilder if you don’t have it)
  • retrieve the .deb’s from /var/cache/pbuilder/results/

It would be really nice if someone set up a PPA-like service for Debian, at least for repositories of source packages. I realize that setting up a cluster of build boxes is possible only with someone like Canonical behind. But the required storage for source packages could be quite small: if the *.orig.tar.gz “link” would dynamically retrieve an archive hosted elsewhere (a webapp could do this, trading space for bandwidth), such repositories could be quite compact (the .dsc and .diff.gz files are usually tiny). Alternatively, this scheme might work with a modified apt that could recognize HTTP redirects.

April 30, 2008

gdb: examining complex c++ objects

Filed under: linux, c, gdb — Dan Muresan @ 8:25 am

I’ve been doing quite a bit of C++ programming (and, alas, debugging) for a project lately. One endless source of annoyance in C++ (at least in Linux) is the impedance mismatch between the compiler (gcc) and the debugger (gdb). C++ is notoriously hard to compile (and even just parse). gdb does a bit of name-demangling, but quickly finds itself out of its depth for complex C++ features (like heavy template usage). This is, after all, an old problem — even with C programs, debugging macro-ladden code is painful.

But I’m not going to get into the details of that; today I’m going to show you how to deal with a lesser annoyance, namely examining STL objects. For example, if you use the gdb’s standard print (or p) command, strings look like a mess, and long ones are truncated:

#include 
#include 
#include 
using namespace std;

int main () {
  string s = “”;
  for (int i = 0; i < 5; i++)
    s += "This is a very, very long line.\n";
  s = s + s;
  cout << s;
  return 0;
}
~$ gdb testprog
(gdb) b 12
Breakpoint 1 at 0×8048c18: file testprog.cc, line 12.
(gdb) r
Breakpoint 1, main () at x.cc:12
12	  cout << s;
(gdb) p s
$1 = {static npos = 4294967295,
  _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> =
    {<No data fields>}, <No data fields>},
    _M_p = 0x804b3ac "This is a very, very long line.\nTh"...}}

(ok, I had to truncate the string manually here for readability purposes — but the exact value of the maximum width isn’t the issue here.) A much better way to examine strings is to use gdb’s printf command on the appropriate member of the STL object:

(gdb) printf "%sn" s._M_dataplus

This displays the actual string, without encoding newlines as \n. An even better way is to define a printstr command that you can reuse in future gdb sessions; create or edit the file ~/.gdbinit and add the following snippet:

define printstr
  printf "`%s'n", $arg0._M_dataplus._M_p
end

This will allow you to simply say printstr s whenever you need to examine a string. Of course, this definition relies upon GCC’s internal representation of a std::string, which may change from time to time.

After developing this gdb macro, I discovered Dan Marinescu’s excellent STL Views gdb scripts, which adds support for examining vectors, maps, sets (and, yes, strings). The ideea is the same. If you spend any significant time inside gdb, this is an invaluable tool.

It’s probably a good ideea to take this further and create similar printer functions for all complex (and frequently examined) classes in C++ projects. GDB’s user-defined commands are extensively documented in the manual. You don’t need to put such commands in ~/.gdbinit; you can create a separate script and load it using source scriptname.gdb when needed.

May 11, 2007

Why the C preprocessor is a good thing

Filed under: c, macros — Dan Muresan @ 8:04 am

Yesterday, Christian Kienle argued that the C preprocessor is a bad thing. When a language lacks closures and garbage collection and forces static typing without type inference on its users, you would think that a moderately powerful feature like preprocessor macros would get some respect, at least in these times of programming-language renaissance when there are so many good alternatives.

First of all, I believe that Paul Graham’s advice holds true in any language: macros should only be used when nothing else will do. But when that happens, avoiding macros leads to contorted or verbose solutions.

Let’s look at Christian’s arguments:

Debugging preprocessor macros is hard

It’s true that most debuggers can’t map compiled code back to the original macros. However, most debugging is (or should be) done outside debuggers, and debugging would be hard without the preprocessor:

  • the preprocessor provides the __FILE__ and __LINE__ macros. Yes, they could be predefined identifiers, just like C99’s __func__, but that’s actually a less flexible solution: since C concatenates adjacent string literals, you can write "error in " __FILE__, but you can’t do that with __func__
  • assert can only be written as a macro, since it needs to stringify the condition being tested
  • Without macros, you’d be forced to invoke logging primitives like in Java:
    if(logger.isDebugEnabled() {
      logger.debug(expensive_function ());
    }
  • using macros and RAII in C++, you can write a tracing system

Preprocessor macros are not type-safe

True, and it’s the closest thing that C/C++ have to type inference. Christian doesn’t actually show how this supposed type-unsafety can bite you, but instead points to the next reason and suggests that you use templates in C++ (or NSNumbers in Objective C). I don’t know about Objective C, but

template <class T1, class T2> bool min (T1 x, T2 y)
{ return x < y ? x : y; }

looks pretty verbose to me.

Preprocessor macros often lead to side effects

What this means is that macro arguments can appear multiple times in the macro-expansion:

#define MAX( a, b ) ((a) < (b) ? (a) : (b))

If one of the arguments is an expression with side effects (such as x++ or a function call that modifies some state), then we have a bug. This is true, but

  • programming with side effects is not a good practice. Even if you don’t have the luxury to program in a functional language, you should still strive to minimize reliance on side-effects
  • macros are usually given capitalized names, like MAX, just so they scream at you when you are about to type MAX (x++, f (y))
  • if one of the arguments is a function f(), but f has no side effects, the compiler may be able to optimize away redundant multiple invocations
  • you get what you pay for — this is not Lisp, after all.

Of the three arguments against macros, only the last one is actually a serious objection; and just because the C preprocessor is too weak doesn’t mean you shouldn’t use it when necessary.

Finally, for fun, I’d like to point you to some macro magic:

If you have other cool examples, feel free to add them in the comments.

[ Powered by WordPress ]