Georg's Log

Sun 07 July 2019

Announcing MANPATH.be

Posted by Georg Sauthoff in misc   

Recently, I launched MANPATH.be - a site for convenient man page browsing. It provides access to the man pages of different distributions, including Fedora, CentOS and OpenSolaris. The about page concisely describes some of the sites features, e.g. human readable links like f30/3/memcpy, permalinks and various kinds of inter-page links. The following sections give some detail on the motivation behind this project and the technical decisions in its design and implementation.

Motivation

The motivation behind this project is to solve use cases I'm interested in. Mainly to look up man pages of distributions I don't always have access to, quickly jump between man pages of different distributions and versions, conveniently navigate man pages and create exact and stable references to man pages.

These use cases come up when developing portable software, working on source code in a restricted environment and when writing documents that require man page references like - say - technical Wikipedia articles, blog posts and Stackoverflow questions and answers.

Go

One reason behind implementing the HTTP manpage backend daemon in the Go programming language is to use this opportunity to get more familiar with Go in general and with the HTTP, template and SQL packages (of the Go standard library) in particular.

Also, some alternatives aren't necessarily that attractive. For example, Python - while being a great choice for many use cases - using it for a potentially heavily loaded HTTP server is probably not the best fit.

Although there is the fine Flask package for Python that provides a microframework for implementing HTTP daemons, it arguably contains too much magic which complicates some tasks. Flask has several deployment options, some which involve middleware components which may lead to accidental complexity.

On the other hand, Go has some nice features that are well suited for such a daemon. For example, Go routines (and channels) allow for concurrent programming beyond relatively low-level threading and asynchronous constructs which are also hard to combine. It also helps for performance that Go slices are lightweight references (instead of objects that create copies like immutable Python slices), for example when using them inside a templating package. Another win for performance is that, although Go uses garbage collection, Go programs are compiled into machine code, whereas Python uses a simple virtual machine.

However, Go isn't a perfect language. Some languages corners are arguably too low-level like - say - error handling. And there isn't really a culture of semantic versioning.

As expected, the fact that Go doesn't provide exceptions leads to most functions being interspersed with some verbose and redundant error handling code. Rust, a language of similar age, also doesn't have exceptions but at least provides some means to eliminate redundant error checking code.

Go also doesn't have C++ style RAII which really simplifies creating resource leaks. In contrast, although Python implements reference counting it also has with-statement context managers which are arguably more convenient to use than the Go defer mechanism.

In conclusion, for this concrete use case, the advantages of Go outweigh its disadvantages and its standard library already covers many needs.

Python

The man page loader that pre-renders man pages for the backend and is able to bulk import man pages directly from a package repository is written in Python.

The execution speed of Python is good enough for this task, especially since the imports are batch jobs that happen at a low frequency.

On the other hand, the high-level Python allows to get the job in less code, e.g. when mangling strings and calling external processes. Also, the excellent SQLAlchemy Core package with its expressions leads to boilerplate-free, portable and compact code and thus simplifies interacting with relational databases, a lot.

PostgreSQL

PostgreSQL (or Postgres) is a well performing and stable open-source relational database with many useful features. Its support for relatively recent additions to the SQL standard is generally good and better than what MySQL/MariaDB or even Oracle offers.

For example, Postgres supports transactions for DDL statements whereas Oracle just implicitly commits the current transaction on the next DDL statement. A string of length zero is just that string and not implicitly converted to NULL, like Oracle does it. In contrast to Oracle, it also supports a boolean data-type. Postgres' upsert support is more versatile, robust and useful than what Oracle and MySQL offer. In contrast to MySQL, Postgres' support of aggregate functions is superior.

Postgres also offers many useful string, regex and array functions like regexp_replace, regexp_split_to_array and array_remove.

In addition to that, Oracle isn't Open-Source, comes with horrible licensing conditions and is a pain in the neck to install and maintain.

Postgres is included in the package repositories of most Linux distributions. And the Postgres project even maintains own repositories for several distributions. For example, using the upstream Postgres repository it's very simple to get the currently stable Postgres 11 on CentOS 7 (which provides version 9 in its base repository).

No JavaScript

Since JavaScript is a badly designed language, it's natural to strive for using it as little as possible. Also, with the current (and even not so current) state of web standards, CSS is often sufficient even for realizing adaptive ('responsive') layouts.

Some established frontend frameworks (e.g. Bootstrap) come with good design defaults and pre-defined components but also with metric tons of JavaScript. Also for supporting legacy browsers.

Besides being a security risk to include tons of third-party obfuscated ('minified') JavaScript (that also increase page load times) I'm simply not interested in supporting very old legacy browsers.

Thus, the MANPATH.be site is completely JavaScript free.

CSS

It turns out that for the purpose of presenting man pages in a web-browser (including a responsive layout) actually very little CSS is required (75 lines or so).

Looking into CSS for this project, I've learned some lessons. For example, that the traditional simple three-column layout (navigation/main/other) has a name and is called The Holy Grail Layout.

Another one is that CSS not just has one mechanism but two for realizing adaptive responsive layouts: The CSS Grid Layout Module and the CSS Flexible Box Layout Module - also known as CSS Grid and Flexbox. Searching the web for CSS Grid is sometimes complicated by the fact that also legacy CSS solutions are included that implement a grid without using the CSS Grid Layout Module. Also, a search may turn up framework specific solutions that might or might not use CSS Grid, internally.

The main difference between CSS Grid and Flexbox is often stated as Grid being developed for flowing elements in (up to) two dimensions while Flexbox being developed for flowing element along one axis (i.e. in a row/column). Thus, CSS Grid sounds like the natural solution to the Holy Grail Layout problem.

But this isn't really accurate as it's possible to construct a Flexbox column layout that downgrades into rows if the screen is too small. And that without even using CSS media queries. In that sense Flexbox is well suited for two dimensional layouts, too.

It seems that the same responsive layout can't be implemented with CSS Grid alone. Instead one has to work with CSS media queries to switch between different grids or even switch between a Grid layout and Flexbox one. At least I didn't come up with a Grid-only solution and all the CSS Grid examples I found online used CSS media queries.

I thus implemented the responsive three column layout for MANPATH.be using CSS Flexbox constructs. The complexity of Flexbox looks well balanced and being able to avoid CSS media queries reduces the complexity of the overall solution.