Indentation Without Dents

Multiline function invocations generally follow the same rule as for signatures. However, if the final argument begins a new block, the contents of the block may begin on a new line, indented one level.

– Style Guidelines, Rust Documentation

Automatic indentation can be a great joy to use - but also equally irritating when implemented incorrectly. In this article, I will attempt to guide you through writing a Vim indentation plugin for a subset of the MATLAB programming language. Just so that we are all on the same page, here is an example of what we want to be able to indent:

if true, disp foo, end

if true, if true
		A = [8 1
			3 5];
	end, end

While Vim indentation plugins are just files with Ex commands like any other Vim runtime files, there exist some hoops that facilitate interplay between plugins and the user’s configuration. Filetype-specific indenting is enabled by the filetype indent on command (see help :filetype-indent-on). This loads the indent.vim file, which adds an autocommand that runs runtime indent/{filetype}.vim once per buffer for the current filetypes. Recall that :runtime sources the file in each directory in the order they are found in 'runtimepath', which on Unix-like systems defaults to something like: "$HOME/.vim, …, $VIMRUNTIME, …". Now say that we create a new MATLAB indent plugin in $HOME/.vim/indent/matlab.vim to replace the default one found at $VIMRUNTIME/indent/matlab.vim. How would Vim know which one to choose?

The answer to that question is that indent plugins are assumed to begin with a so-called load guard:

" Only load if no other indent file is loaded
if exists('b:did_indent') | finish | endif
let b:did_indent = 1

This checks whether the current buffer has the b:did_indent variable defined (the b: prefix designates a variable local to the current buffer). If so, we halt execution, otherwise, we define it and continue. Since our home directory by default is earlier in 'runtimepath' than $VIMRUNTIME, our new plugin gets a shot first at configuring indentation, and so the default plugin stops and does nothing.

How to indent: `'indentexpr'`

Next comes actually hooking into Vim’s indentation mechanism. Vim already has good support for indenting C-like languages. For other languages, however, it is achieved through two options, of which we will start with the first one: 'indentexpr'. When Vim calculates the proper indent for a line it evaluates 'indentexpr' with the v:lnum variable and cursor set to the line in question. The result should be the number of spaces of indentation (or -1 to keep the current indent). Writing the whole indent routine in a string expression would get cramped, so let us define a function GetMatlabIndent() and set 'indentexpr' to call it:

setlocal indentexpr=GetMatlabIndent()

" Only define the function once
if exists("*GetMatlabIndent") | finish | endif

function! GetMatlabIndent()
	return 0
endfunction

:setlocal is used to only set 'indentexpr' in the current buffer. While this has to be done once per buffer, it suffices to define GetMatlabIndent() only when running the script for the first time. Thus we check and only define the function when necessary (remember to comment out when developing iteratively!). For now we will have the code stick to the left margin by always returning an indentation of zero spaces for every line.

Later we are going to want to return other indentations than zero. To honor the user’s choice of 'shiftwidth', the number of spaces to use per indent step, we will shift focus to indentation levels and therefore return indentlvl * shiftwidth() instead, which is also easier to reason about. (Sidenote: shiftwidth() is a simple wrapper around the user option 'shiftwidth', that takes care of some intricacies such as using 'tabstop' when 'shiftwidth' is zero.)

So how do we actually obtain the indentation level? Well, this is obviously going to depend a lot on the language. In the existence of some official style guide, trying to make indentation conform to that would be a great idea. Here I have tried to mimic the MATLAB R2018b editor. Let us start with what a naïve implementation could look like:

let prevlnum = prevnonblank(v:lnum - 1) " Get number of last non-blank line
let result = 0
if getline(prevlnum) =~ '\C^\s*\%(for\|if\| ... \|enumeration\)\>'
	let result += 1 " If last line opened a block: indent one level
endif
if getline(v:lnum) =~ '\C^\s*\%(end\|else\|elseif\|case\|otherwise\|catch\)\>'
	let result -= 1 " If current line closes a block: dedent one level
endif
" Get indentation level of last line and add new contribution
return (prevlnum > 0) * indent(prevlnum) + result * shiftwidth()

While a great start, this falls apart pretty quickly, the reason being that MATLAB, like many other languages, supports opening multiple blocks per line, e.g.:

if true, if true
		disp Hello
	end
end

Counting stuff with `search*()` and friends

Clearly a method is needed to count all block openers and closers and not only the first on each line. Let us define a function s:SubmatchCount() that takes a line number, a pattern and optionally a column, and counts the occurrences of each sub-expression in the pattern on the specified line, up to a given column, or, otherwise, the whole line:

function! s:SubmatchCount(lnum, pattern, ...)
	let endcol = a:0 >= 1 ? a:1 : 1 / 0
	...
endfunction

Some peculiarities about optional parameters in Vimscript: ... specifies that the function takes a variable number of extra arguments, the number of which is given by a:0 - a:1 would then be the first extra argument. So if there is at least an extra argument we set endcol to it, otherwise to 1 / 0 which evaluates to Infinity. Then in the function body, we employ searchpos() to find the next match:

let x = [0, 0, 0, 0] " Create List to store counts in
call cursor(a:lnum, 1) " Set cursor to start of line
while 1
	" Search for pattern and move cursor to match
	" The `c` flag means we accept a match at the cursor position
	" And the `e` flag says that the cursor should be placed at the end of the match
	" With the `p` flag we get the index of the submatch that matched
	let [lnum, c, submatch] = searchpos(a:pattern, 'cpe', a:lnum)
	" If found no match, or match is past endcol, break
	if !submatch || c >= endcol | break | endif
	" If the match is not part of a comment or a string
	if !s:IsCommentOrString(lnum, c)
		" Increment counter. submatch is one more than the first submatch in the pattern
		let x[submatch - 2] += 1
	endif
	" Try to move the cursor one step to the right to not match the same text again
	" If it remained in place we hit the end of the line: break
	if cursor(0, c + 1) == -1 || col('.') == c | break | endif
endwhile
return x

The list x contains four elements because that many ought to be enough. The referenced function s:IsCommentOrString() is useful for most indentation scripts, and defined as:

" Returns whether a comment or string envelops the specified column.
function! s:IsCommentOrString(lnum, col)
	return synIDattr(synID(a:lnum, a:col, 1), "name")
		\ =~# 'matlabComment\|matlabMultilineComment\|matlabMultilineComment\|matlabString'
endfunction

It hooks into Vim’s syntax machinery to query the name of the syntax item at the specified cursor position and return whether it is a comment or a string. It should be noted that this is a pretty expensive operation performance-wise. Nevertheless, all combined this allows us to accomplish what we set out to do:

function! s:GetOpenCloseCount(lnum, pattern, ...)
	let counts = call('s:SubmatchCount', [a:lnum, a:pattern] + a:000)
	return counts[0] - counts[1]
endfunction

That is, define s:GetOpenCloseCount() which returns how many blocks the line opens relative to how many it closes, given a pattern with sub-expressions for opening and closing patterns. The […] + a:000 syntax is Vim script for concatenating two Lists, where a:000 is a List of all extra arguments.

A word on search*(): The search*() family of functions all accept the z flag, with which searching starts from the current column, instead of starting at the beginning of the line and skipping matches that occur before the column (relevant line in the source code). I guess this could end up making a difference if \zs was used in the pattern, but that is fairly niche. Additionally, adding the z flag to all search*() invocations led to a 35% reduction in run time in a quick-and-dirty benchmark (10 s vs 15 s on a 5000 lines long file). As the z flag was added fairly recently in patch 7.4.984,
let s:zflag = has('patch-7.4.984') ? 'z' : ''
may be used to check for it.

Pay homage to Zalgo

Equipped with a tool to count things that open/close blocks but one question remains: What are we supposed to search for? Time to bring out the ol’ trusty regex hammer. Let us define pair_pat as the pattern to pass to s:GetOpenCloseCount():

" All keywords that open blocks
let open_pat = 'function\|for\|if\|parfor\|spmd\|switch\|try\|while\|classdef\|properties\|methods\|events\|enumeration'

let pair_pat = '\C\<\(' . open_pat . '\|'
		\ . '\%(^\s*\)\@<=\%(else\|elseif\|case\|otherwise\|catch\)\)\>'
		\ . '\|\S\s*\zs\(\<end\>\)'

Hopefully we can discern the two sub-expressions enclosed by $…$. Remember that the first one matches things that indent, and the second, things that dedent. So indent for each open_pat match in the previous line and on else/elseif/case/otherwise/catch at the start of the line (\@<= signifies positive lookbehind; ^\s* has to match before what follows). Then dedent for each end that is not at the start of the line (which is handled separately). Now the following

if getline(prevlnum) =~ '\C^\s*\%(for\|if\|enumeration\)\>'
	let result += 1 " If last line opened a block: indent one level
endif

may be replaced with:

if prevlnum
	let result += s:GetOpenCloseCount(prevlnum, pair_pat)
endif

Just this alone makes for a rather robust solution for simple languages.

Reusing intermediate calculations

Next I thought it would be fun to see how one could go about implementing indentation of MATLAB brackets. These are interesting as they require context beyond the current line and the one above. Take, for example, this cell array literal:

myCell = {'text'
	{11;
	22; % <-- Not indented twice

	33}
	};

Indentation of the line containing 22 has to account for it already being inside one pair of braces. The following set of rules may be formulated for indenting the current line, given that bracketlevel is the number of nested brackets at the end of the line two lines above the current one, and curbracketlevel, one line above:

	`curbracketlevel == 0`	`curbracketlevel > 0`
`bracketlevel == 0`	-	indent
`bracketlevel > 0`	dedent	-

Having access to the function s:GetOpenCloseCount(), calculating bracketlevel and curbracketlevel should not prove too much of a hassle. If we are clever we can also deduce that it suffices to only consider lines above with the same indentation, plus the one with less - assuming prior lines are correctly indented. The code becomes, with s:bracket_pair_pat as '$\[\|{$\|$\]\|}$':

let bracketlevel = 0
let previndent = indent(prevlnum) | let l = prevlnum
while 1
	let l = prevnonblank(l - 1)
	let indent = indent(l)
	if l <= 0 || previndent < indent | break | endif
	let bracketlevel += s:GetOpenCloseCount(l, s:bracket_pair_pat)
	if previndent != indent | break | endif
endwhile

let curbracketlevel = bracketlevel + s:GetOpenCloseCount(prevlnum, s:bracket_pair_pat)

Then, the indentation offset can be calculated using the table above. However, with this algorithm indentation becomes O(n^2) with respect to the number of lines indented. For a single line using the = operator this won’t matter, but imagine gg=G on a 3000 lines long file. Yikes! The key observation for solving this is that Vim indents lines in ascending order, and that curbracketlevel becomes bracketlevel for the next line. So we make bracketlevel a buffer-local variable, b:MATLAB_bracketlevel, namespacing it as appropriate, and update it at the end of GetMatlabIndent()! Profit?

Well, now if we were to indent line 29 and then jump to line 42 and indent it as well, we would reuse the potentially wrong value for b:MATLAB_bracketlevel. Likewise, if we indented a line, then edited it, and tried indenting the line below. Somehow the cache has to be invalidated. The solution lies in the b:changedtick variable, which gets incremented for each change (crucially not in-between indenting multiple consecutive lines with = however!). Let us introduce b:MATLAB_lastline and b:MATLAB_lasttick and update these after indenting, allowing us to write:

if b:MATLAB_lasttick != b:changedtick || b:MATLAB_lastline != prevlnum
	... " Recalculate bracket count like above
endif

Back to O(n) time complexity again!

When to indent: `'indentkeys'`

The value of 'indentexpr' is not evaluated on every keystroke. Instead the option 'indentkeys' defines a string of comma-separated keys that should prompt recalculation of the indentation for the current line when typed in Insert mode. The keys follow a particular format that is neatly documented in :help indentkeys-format so I will not go into too much detail here. A cute little trick however is to append 0=elsei to 'indentkeys', which will emulate the IDE behavior of making the line jump back one level when typing the i before the f in elseif, as if indentation was calculated on every keystroke. It is just faking it but I find it fun.

No sandbox play

Execution of indent scripts is not sandboxed; the regular Vim context is used. Changing the cursor position is the only side effect allowed by 'indentexpr'; it is always restored. All other forms of side effects would become apparent to the user. Editing files is also out of bounds.

The user of your plugin may have several options set that change standard Vim behavior or differ from your configuration. One should be aware of case sensitivity and magic-ness when using regular expressions and strive to write the file such that it works with any option settings. One such option is the compatible-options that offer vi compatibility; to combat this we can set them to their Vim defaults with set cpo&vim. This would for example matter if we used line continuations. Following the general pattern, we store the value set by the user in a temporary variable to restore it after execution:

let s:keepcpo = &cpo
set cpo&vim
...
let &cpo = s:keepcpo
unlet s:keepcpo

Also, be aware of certain features not being compiled in. Use the has() function to check for available features and exists() for functions, options, et cetera.

Thanks for reading! Hopefully this article will prove useful to you and generalize to whatever other languages you wish to support. One should also keep in mind that cindent() can be used to great effect even when using 'indentexpr' to do some fix-ups, but that is out of scope for this article. Writing indentation scripts can be perilous - but with a healthy test suite set up it can also be rather rewarding. This article should also serve as some kind of argument for why you would want to use something like tree-sitter instead of regexes. The full MATLAB indent file authored by me is found in the Vim source tree.

CC BY 4.0

First published in Vimways 2019.

How to indent: 'indentexpr'

Counting stuff with search*() and friends