Thursday, November 21, 2013

Understanding the QWidget layout flow

When layouts in a UI are not behaving as expected or performance is poor, it can be helpful to have a mental model of the layout process in order to know where to start debugging.  For web browsers there are some good resources which provide a description of the process at different levels. The layout documentation for Qt describes the various layout facilities that are available but I haven't found a detailed description of the flow, so this is my attempt to explain what happens when a layout is triggered that ultimately ends up with the widgets being resized and repositioned appropriately.

  1. A widget's contents are modified in some way that require a layout update. Such changes can include:
    • Changes to the content of the widget (eg. the text in a label, content margins being altered)
    • Changes to the sizePolicy() of the widget
    • Changes to the layout() of the widget, such as new child widgets being added or removed
  2. The widget calls QWidget::updateGeometry() which then performs several steps to trigger a layout:
    1. It invalidates any cached size information for the QWidgetItem associated with the widget in the parent layout.
    2. It recursively climbs up the widget tree (first to the parent widget, then the grandparent and so on), invalidating that widget's layout. The process stops when we reach a widget that is a top level window or doesn't have its own layout - we'll call this widget the top-level widget, though it might not actually be a window.
    3. If the top-level widget is not yet visible, then the process stops and layout is deferred until the widget is due to be shown.
    4. If the top-level widget is shown, a LayoutRequest event is posted asynchronously to the top-level widget, so a layout will be performed on the next pass through the event loop.
    5. If multiple layout requests are posted to the same top-level widget during a pass through the event loop, they will get compressed into a single layout request. This is similar to the way that multiple QWidget::update() requests are compressed into a single paint event.
  3. The top-level widget receives the LayoutRequest event on the next pass through the event loop. This can then be handled in one of two ways:
    1. If the widget has a layout, the layout will intercept the LayoutRequest event using an event filter and handle it by calling QLayout::activate()
    2. If the widget does not have a layout, it may handle the LayoutRequest event itself and manually set the geometry of its children.
  4. When the layout is activated, it first sets the fixed, minimum and/or maximum size constraints of the widget depending on QLayout::sizeConstraint(), using the values calculated by QLayout::minimumSize(), maximumSize() and sizeHint(). These functions will recursively proceed down the layout tree to determine the constraints for each item and produce a final size constraint for the whole layout.  This may or may not alter the current size of the widget.
  5. The layout is then asked to resize its contents to fit the current size of the widget using QLayout::setGeometry(widget->size()). The specific implementation of the layout - whether it is a box layout, grid layout or something else then lays out its child items to fit this new size.
  6. For each item in the layout, the QLayoutItem::setGeometry() implementation will typically ask the item for various size parameters (minimum size, maximum size, size hint, height for width) and then decide upon a final size and position for the item. It will then invoke QLayoutItem::setGeometry() to update the position and size of the widget.
  7. If the layout item is itself a layout or a widget, steps 5-6 proceed recursively down the tree, updating all of the items whose constraints have been modified.
A layout update is an expensive operation, so there are a number of steps taken to avoid unnecessary re-layouts:
  • Multiple layout update requests submitted in a single pass through the event loop are coalesced into a single update
  • Layout updates for widgets that are not visible and layouts that are not enabled are deferred until the widget is shown or the layout is re-enabled
  • The QLayoutItem::setGeometry() implementations will typically check whether the current and new geometry differ or whether they have been invalidated in some way before performing an update. This prunes parts of the widget tree from the layout process which have not been altered.
  • The QWidgetItem associated with a widget in a layout caches information which is expensive to calculate, such as sizeHint(). This cached data is then returned until the widget invalidates it using QWidget::updateGeometry()

Given this flow, there are a few things to bear in mind to avoid unexpected behaviour:
  • Qt provides multiple ways to set constraints such as fixed and minimum sizes.
    • Using QWidget::setFixedSize(), setMinimumSize() or setMaximumSize(). This is simple and available whether you control the widget or not.
    • Implementing the sizeHint() and minimumSizeHint() functions and using QWidget::setSizePolicy() to determine how these hints are handled by the layouts. If you control the widget, it is almost always preferable to use sizePolicy() together with the layout hints.
  • The layout management documentation suggests that handling LayoutRequest events in QWidget::event() is an alternative to implementing a custom layout. A potential problem with this is that LayoutRequest events are delivered asynchronously on the next pass through the event loop. If your widget is likely to update its own geometry in response to the LayoutRequest event then this can trigger layout flicker where several passes through the event loop occur before the layout process is fully finished. Each of the intermediate stages will flicker on screen briefly, as the event loop may process a paint event on each pass as well as the layout update, which looks poor. So if you need a custom layout, subclassing QLayout/QLayoutItem is the recommended approach unless you're sure that your widget will always be used as a top-level widget.

Monday, November 4, 2013

Improving build times of large Qt apps

My colleagues and I spent time recently improving build times of a largish Qt app (Mendeley) and its associated test suite. I'm sharing some notes here in case anyone else finds them useful. Most of the steps here fall under one of a few basic ideas:
  • Measure first
  • Do more in parallel
  • Work around the inefficiencies of C++ compilation
  • Use faster tools
  • Do less disk I/O
All of these steps can improve build times on all platforms, but those that reduced the amount of I/O during builds were especially effective on Windows.

Measure first

When we started out, I expected that running the tests would be consuming most of our CI system's cycle time. In the end it turned out that the largest bottleneck was actually just building the code on Windows, which was taking 3x as long as Linux (30 mins for a fresh build vs 10 on Linux). The unit tests did take longer to run on Windows by a factor of 2 (20mins total vs 10 on Linux).

Use those cores!

One of simplest things to address is usually taking advantage of multiple cores on your system. The '-j' argument to make sets the number of parallel jobs. The optimal number will depend on a number of factors. Setting the value to the number of cores is a reasonable starting point, but check what happens with different values.

When running unit tests, use the option in the driver to run multiple tests in parallel. ctest supports a '-j' argument for this as well. An important thing to remember before enabling this is that your tests need to be set up so that they can't interfere with one another. This means not trying to use the same resources (files, settings keys, I/O ports, web service accounts etc.) at the same time. Some tests might be easier to isolate than others in which case you can split your test suite into subsets and only run some of the subsets in parallel. ctest has a facility for assigning labels to tests using.


CTest then has a set of command-line arguments that can be used to run only tests with labels matching a certain pattern, or exclude tests with labels matching a certain pattern. This can then be used to run only a subset of tests which are known not to interfere with one another concurrently.

Working around C++ compilation inefficiency

When the compiler encounters an #include statement, it effectively copies and pastes the content into the current source file. The resulting output that the compiler has to lex, parse and understand the semantics of ends up being tens of thousands of lines long in the case of a typical source file in a Qt app. The more you use code-heavy headers such as the C++ standard library or Boost, the worse this gets. This is incredibly inefficient and means that much of your build time can be spent re-parsing the same source code over and over. This is compounded by the complexity of parsing C++ in the first place.

Consider this very simple list view app.  There are only 15 lines of actual code in the example but the preprocessed output, which can be produced by passing the -E flag to gcc, is just under 43,000 lines of actual code (as determined by sloccount) or just under 60,000 lines when C++11 mode is enabled (using the '-std=c++0x' flag).

In a language with a proper package/module system (eg. C#, Go or many other languages), processing an import only involves reading some metadata from the already-compiled module rather than re-parsing everything. A proper module system for C++ is in the works but is still some way off. In the meantime, there are hacks workarounds available which can help considerably.

Precompiled headers

MSVC, GCC and Clang all have good support for precompiled headers. The use of precompiled headers is even more important now since the preprocessed output of many of the #includes from the C++ standard library grows considerably in size when C++11 is enabled. Note that under MSVC on Windows, C++11 mode is always enabled.

With the small example above, creating a precompiled header which includes just the QStringList header reduces compile times for the main .cpp file on my system from ~1.1s to ~0.7s (about 35%). This sounds modest but adds up by the time you have a project with hundreds of source files. Even in a small project with just a few dozen source files I think it is worthwhile.

The steps to enable precompiled headers will depend on the build system you are using. With qmake, this is relatively simple. CMake lacks a simple built-in command for this but there are samples online that we used as a basis.

A downside of precompiled headers is that you are effectively automatically #including an extra header with every file that you build, so a file may compile in a build with precompiled headers but fail to build in one without if the file is missing necessary #includes that are supplied by the precompiled header when enabled. If you're running a CI system is therefore useful to have at least one regular build that is not using precompiled headers.

Unity builds

A unity build involves creating a single source file which #include's all the source files from a particular module or the whole project and compiling that at once. The main caveat with this approach is that variables and functions declared within an implementation .cpp file may now clash with declarations from other source files - since they are now being compiled together as a single source file instead of separately.

More efficient build tools

Part of the reason for a gradual creep in built times as a project grows is due to scaling issues with build tools. The amount of time taken for a do-nothing build (ie. running 'make' when everything is up to date) grows noticeably with cmake + make as the total number of targets to build increases. Fortunately for us, engineers on Google Chrome ran into this problem harder and long before we did so they have produced some helpful replacements for the standard tools:
  • The Ninja build system is designed to be faster, especially for incremental builds where little changed. Recent versions of CMake have built-in support for generating Ninja build files (use 'cmake -G Ninja' to generate Ninja build files). The difference in build speed for incremental builds where little changed is decent on Mac and Linux but very noticeable on Windows compared to nmake. Prior to Ninja, Qt developers also created jom as a faster alternative to make.
  • On Linux, the Gold linker is faster than the traditional ld linker and can often be used as a drop-in replacement.

Reducing total disk I/O

Disk I/O is very slow, reducing the total amount of I/O (especially random I/O) required during a build can improve overall build times substantially. Anecdotally, this is especially true on Windows where reducing the total amount of I/O performed during a clean build had the largest impact in terms of achieving parity between build + test times on Windows and build times on Linux and Mac.

Use faster hardware

It always feels a little dirty to solve software inefficiency by throwing faster hardware at the problem but if you can afford it, it can be a quick win.
  • Adding more memory will reduce the likelihood of the build system swapping.
  • A good SSD drive will speed up disk I/O, especially for operations which do a lot of random I/O.
  • If you have a lot of memory spare you can create a RAMDisk and do the build on that.
I haven't compared the impact of an SSD vs. a standard IDE drive myself, this advice comes mostly from Chromium developers build notes.

Reducing debug info size

In debug builds, a large proportion of the total size of data read/written from disk is typically debug information. When doing local development, this information is usually useful. When generating builds on a continuous integration system that will purely be used for automated tests, this is less so.
  • All compilers (MSVC, gcc, clang) have switches to control the amount of debug info that is generated. With gcc/clang these are controlled by the -gXYZ switches.

Generating fewer binaries for tests

For every binary that is generated as part of a project, there are a number of overheads:
  • Each binary will add a number of additional targets to the build system
  • Each binary requires a linking step - which can be memory and I/O intensive.
  • Each binary generated requires reading/writing additional data to disk. The cost of this depends on how large the generated binary is and how many files need to be processed to assemble the final binary.
In our case, we are using the QTestLib framework for unit tests, which by default encourages the creation of one test class per original class. Each test class is then compiled into a separate binary with a QTEST_MAIN($TEST_CLASS_NAME) macro providing the entry point for the test app. This works fine for smaller apps. When a project grows larger however and you have hundreds of test classes, the overhead of linking all of those binaries can add noticeably to the total build time.

We changed the test builds to produce one test binary per source directory instead of one per test class. This was done by replacing the QTEST_MAIN() macro with a substitute which instead declares a '$TESTCLASS_main()' function and registered it in a global map from test class to init function on startup. All of the test classes are then compiled and linked together with a small stub library which declares the 'int main()' function that reads the name of the test to run from the command-line and calls the corresponding '$TESTCLASS_main()' function, forwarding the other command-line arguments to it. This allows multiple Qt test cases to be linked into a single binary which improves build times in several ways:
  • The number of linking operations during builds was considerably reduced.
  • The total amount of binary data generated on disk was reduced as code that was previously statically linked into the test binary for each test class is now only linked into a single test binary for each group of tests.
  • The total number of make steps and targets for the whole project was reduced.
On Windows this change shaved 30% off our total build time and the impact on build times of adding a new test case is now greatly reduced.

Generating smaller binaries

Another way to reduce the size of compiled binaries is to build each module of the app into a shared rather than static library. This is sometimes referred to as a 'component build'. When there are many executables being generated from the same source code this reduces the amount of work for the linker and the amount of IO by only generating the shared code and associated debug info once when building the shared library/DLL, instead of linking it separately into each binary.

Note that by doing this you are deferring some of the linking work from build time to runtime and consequently startup will slow down as the number of dynamically loaded libraries increases.

Further reading

I hope these notes are useful - please let me know if you have other recommendations in the comments. In the meantime, here are a few notes for existing projects which I found useful background reading:

  • Notes on accelerating Chromium builds on Windows, Linux and Mac - this doesn't involve Qt but the advice is still quite relevant.
  • Notes on improving Firefox's build system.
  • An explanation of how a language designed with build performance in mind differs from C++

Tuesday, June 25, 2013

qt-signal-tools 0.2

A new version of the qt-signal-tools library for connecting signals to arbitrary functions is available.

Changes in this release:

  • Compatibility with earlier versions of Qt 4.  The previous release required Qt 4.8.  The current version works with Qt 4.6 and up and possibly older releases as well.
  • Compatibility with Qt 5.  Though the functionality of QtSignalForwarder can be mostly achieved in Qt 5 using the new signal/slot syntax, this may be useful for creating code which can work with either version or for porting.
  • Performance improvements.
  • QtSignalForwarder::connectWithSender() utility, this provides a convenient way to connect a signal to a slot which includes the sender as the first argument.  eg. connectWithSender(button, SIGNAL(clicked()), form, SLOT(buttonClicked(QPushButton*))) 

The performance improvements come from changing the way that the hidden proxy object which forwards the signal determines where the signal came from.  The previous implementation worked in the same way as QSignalMapper by using QObject::sender() and QObject::senderSignalIndex() to determine which signal the proxy was handling.  These two functions have some overhead though.  Both not only lock a mutex but there is also a linear slowdown as the number of senders connected to a given receiver increases.  The previous version of QtSignalForwarder therefore created a new proxy object for each sender.

So I looked for an alternative way to identify the caller of the slot.  When a signal -> slot connection is established, Qt internally maps the arguments to the SIGNAL() and SLOT() macros to integer method IDs.  The details of the connection, including the receiver, connection type and method IDs of the signal/slot are then stored in a connection object and added to a list maintained by the sender.  When a signal is emitted, Qt invokes the qt_metacall()function provided by the receiver's QMetaObject and passes in the kind of action to perform (property read, property write, method call), a method/property ID and a list of arguments.  This function then forwards the arguments to actual signal/slot method corresponding to the method ID.

The method IDs are normally assigned by moc when it processes a header file and generates the QMetaObject object that is used for all of Qt's introspection features.  However, it is possible to specify the method IDs directly when creating a connection by using QMetaObject::connect(sender, signalMethodId, receiver, receiverMethodId...).  I'm now abusing the receiver method ID by assigning a new ID to each connection.  A caveat is that internally method IDs are stored as 16-bit unsigned integers to save space, since a single QObject-based class would normally have tens of methods at most.  This means there is an upper limit of ~65K unique tags that can be used to identify the connection being invoked.

After removing the use of sender() and senderSignalIndex() in QtSignalForwarder the same proxy object can be re-used for a larger number of senders/receivers.  A caveat is that we now have to be more careful about how this is used in the context of multiple threads.  For now I've kept things simple by restricting the use of QtSignalForwarder::connect() to objects which live on the main application thread, which is not a problem for many practical purposes.  When this does need to be used with an object that lives on a background thread, a new QtSignalForwarder instance can be created and the bind() function used directly.

Saturday, February 16, 2013

qt-signal-tools - Pre-packaged slot calls and connecting signals to arbitrary functions in Qt 4

A useful new feature in Qt 5 is the ability to connect signals to arbitrary functions instead of just Qt signals/slots/properties, including C++11 lambdas.  As this page on the Qt Project wiki explains, this is especially useful when writing code to perform async operations where you often want to pass additional context to the slot.

I've written a small library for Qt 4 which provides similar functionality. The library includes:
  • QtCallback - A pre-canned QObject method call. QtCallback stores an object, a slot to call and optionally a list of pre-bound arguments to pass to the slot. This is useful if you need to pass additional context to the slot, other than the values provided by the signal.
  • QtSignalForwarder::connect() - Connects signals from QObjects to arbitrary functions or methods or QtCallbacks. You can use this together with bind() and function<> to pass additional arguments to the method other than those provided by the signal or re-arrange arguments. You can think of this as a more flexible alternative to the QSignalMapper class that Qt 4 provides. There are also a couple of utility features:
    • QtSignalForwarder::delayedCall() - A more flexible alternative to QTimer::singleShot() which can be used to invoke an arbitrary function after a set delay.
    • Event connections - Invoke an arbitrary function or QtCallback when an object receives a particular type of event. This is useful when the object does not have a built-in signal that is emitted in response to that event and requires less boilerplate than using QObject::installEventFilter()
  • safe_bind() - A downside of connecting a signal to a function object is that the signal does not automatically disconnect if the receiver is destroyed. safe_bind() creates a wrapper around a (QObject*, function) pair which when called, invokes the function if the object still exists or does nothing and returns a default value if the object has been destroyed. You can use this together with QtSignalForwarder to connect a signal to an arbitrary method on a QObject which effectively 'disconnects' when the receiver is destroyed.
For example usage, please see the README, the examples and the tests. The code is available from  The requirements are:
  • Qt 4.8
  • A compiler with the TR1 standard library extensions (most C++ compilers from the past few years - including MSVC >= 2008 and GCC 4.x. I have tested with MSVC 2010 and recent GCC/Clang versions) or one which supports equivalent features from the C++11 standard library.
Compared to the implementation in Qt 5, there are a few disadvantages:
  • Argument type checking happens at runtime when QtSignalForwarder::connect() is called, similar to standard QObject signal-slot connections. QObject::connect() in Qt 5 can do type checking at compile time.
  • In order to do the runtime type checking, the types of arguments passed from the signal to the function or method must be registered using Q_DECLARE_METATYPE() or qRegisterMetaType()
  • Using QtSignalForwarder does have additional overhead since a hidden proxy object is created to route the signal and arguments to the target function. I investigated using a single proxy object for all forwarded signals or a pool of proxies. Unfortunately it turns out that the QObject::sender() and QObject::senderSignalIndex() functions which are used internally have a cost that is linear in the number of connections.
Please let me know if you find this useful. If there is any other related functionality which you'd like to see, please let me know in the comments.

Monday, August 27, 2012

qt-mustache Templating Library

I had a need for a templating library for use with several Qt projects.  I was looking preferably for something simple that is easy to drop into a project and has a familiar syntax.  Existing libraries that I found included Grantlee (a featureful library using Django template syntax), Qustache and QCTemplate (a thin wrapper around Google's CTemplate library for logic-free templates which inspired Mustache).  None of these were quite what I was looking for, so I wrote a small library which uses the popular Mustache template syntax.

Example Usage:
#include "mustache.h"

QVariantHash contact;
contact["name"] = "John Smith";
contact["email"] = "";

QString contactTemplate = "<b>{{name}}</b> <a href=\"mailto:{{email}}\">{{email}}</a>";

Mustache::Renderer renderer;
Mustache::QtVariantContext context(contact);

QTextStream output(stdout);
output << renderer.render(contactTemplate, &context);
 <b>John Smith</b> <a href=""></a>
The main feature, like Mustache itself, is that it doesn't have that many features.  The lack of logic constructs in templates prevents application logic from ending up in the templates themselves. Other 'features' are:

  • Lightweight.  Two source files.  The only dependency is QtCore.
  • Efficient.
  • Complete 'mustache' syntax support (values, sections, inverted sections, partials, lambdas, escaping). I may look at incorporating one or two facilities from Handlebars in future.
  • The standard data source is a QVariantMap or QVariantHash.  There is an interface if you wish to provide your own - eg. if you wanted to use a QAbstractItemModel as the data structure to fill in a template.
  • Partial templates can be specified as an in-memory map or .mustache files in a directory.  You can also provide your own loader if you want to be able to fetch partial templates from a different source.

The code is available from github (BSD license):

Thursday, July 21, 2011

Qt Inspector

Whilst debugging a widget layout problem a few days ago, I was looking around for a tool to view the structure of a Qt application without having to recompile it, or in other words, Firebug / Web Inspector for Qt widgets.  I found the KSpy tool in the KDE repositories which is in need of some love and there are a variety of tools to aid in runtime debugging and modification of QML but not much in the way of tools for QWidget-based interfaces.  Please let me know in the comments if I missed any.

I have put together a simple tool called Qt Inspector.

Qt Inspector starts a specified application or connects to an existing Qt application and once connected can:
  • Browse the object tree of Qt applications.
  • View properties of objects
  • Edit properties of objects
  • Locate a widget in the object tree by clicking on it in the application
  • Copy a reference to an object for use in a debugger (eg. to manipulate it by calling methods on it, examine member fields, setup conditional breakpoints)
Here is a screenshot of Qt Inspector connected to Dolphin showing the widget tree for the settings dialog. Like the Web Inspector or Firebug, this can be used to tweak styling settings, layouts and other properties without a recompile.


Qt Inspector can either attach to an existing application or launch
a specified application and then attach to it.

From a terminal, this can be done with:

qtinspector [process ID]
qtinspector [program name] [args]


Qt Inspector operates by injecting a helper library into the target process using gdb.  This helper library sets up a local socket and listens for requests from the inspector process. The inspector and target process communicate via protocol buffer messages over this socket.

The inspector uses Qt's meta-object system to fetch the properties of an object and read/write their values, so properties need to be declared with Q_PROPERTY for them to be visible to the inspector.


The code is up on GitHub.  Please download it and give it a whirl.  Happy forking :) 

Update:  Eva Brucherseifer let me know about the Basyskom Inspector tool in Gitorious.  In addition to being able to select and inspect widgets it can also view signals and slots, application resources and take screenshots. 

Monday, April 12, 2010

We're hiring Qt developers

I'm currently working for Mendeley, a startup based in London. We're building software for organising, reading, annotating and collaborating on research papers (mostly in PDF format) which integrates with an online network for researchers. We're currently looking for developers to join the team working on our Qt-based desktop application for Windows, Mac and Linux.

Essential skills are:

  • Knowledge of C++ and experience debugging, testing and profiling C++ applications.

  • Experience with Qt. If you're the kind of person who likes delving into the internals of Qt that's even better.

  • Solid computer science basics.

Knowledge of any of the following would be particularly useful:

  • Model/view frameworks (especially Qt's implementation). An interest in or experience with some of the upcoming Qt technologies (eg. Qt Quick) would also be a plus.

  • Databases (in particular, SQLite)

  • Search/indexing frameworks (eg. Lucene)

  • Scripting languages (eg. Python, Ruby)

  • Version control (SVN, git).

  • Automated testing tools (eg. QtTest).

  • Knowledge of platform-specific APIs such as Cocoa on Mac*.

Involvement with open source projects is a big plus. Dog fooding your own software is always helpful, so if you have a background in research or even just like reading papers to find out how things work, that would also be useful.

If you're interested, please get in touch.

* Though Qt abstracts away most platform details, there are times when using native APIs is necessary.