# Hindley–Milner

In type theory, Hindley–Milner (HM) (also known as Damas–Milner or Damas–Hindley–Milner) is a classical type inference method with parametric polymorphism for the lambda calculus, first described by J. Roger Hindley[1] and later rediscovered by Robin Milner.[2] Luis Damas contributed a close formal analysis and proof of the method in his PhD thesis.[3][4]

Among the properties making HM so outstanding is completeness and its ability to deduce the most general type of a given source without the need of any type annotations or other hints. HM is a fast algorithm, computing a type almost in linear time with respect to the size of the source, making it practically usable to type large programs. HM is preferably used for functional languages. It was first implemented as part of the type system of the programming language ML. Since then, HM has been extended in various ways, most notably by constrained types as used in Haskell.

## Introduction

Organizing their original paper, Damas and Milner[4] clearly separated two very different tasks. One is to describe what types an expression can have and another to present an algorithm actually computing a type. Keeping both aspects apart from each other allows to focus separately on the logic (i.e. meaning) behind the algorithm, as well as to establish a benchmark for the algorithm's properties.

How expressions and types fit to each other is described by means of a deductive system. Like any proof system, it allows different ways to come to a conclusion and since one and the same expression arguably might have different types, dissimilar conclusions about an expressions are possible. Contrary to this, the type inference method itself (Algorithm W) is defined as a step-by-step procedure, leaving no choice what to do next. Thus clearly, decisions not present in the logic might have been made constructing the algorithm, which demand a closer look and justifications but would perhaps remain non-obvious without the above differentiation.

## Syntax

Logic and algorithm share the notions of "expression" and "type", whose form is made precise by the syntax.

The expressions to be typed are exactly those of the lambda calculus, enhanced by a let-expression.

Readers unfamiliar with the lambda calculus might not only be puzzled by the syntax, which can quickly be straightened out translating, that the application ${\displaystyle e_{1}e_{2}}$ represents the function application, often written ${\displaystyle e_{1}(e_{2})}$ and that the abstraction means anonymous function or function literal, common in most contemporary programming languages, there perhaps spelled only more verbosely ${\displaystyle {\texttt {function}}\,(x)\ {\texttt {return}}\ e\ {\texttt {end}}}$.

Types as a whole are split into two groups, called mono- and polytypes.[note 1]

Monotypes ${\displaystyle \tau }$, syntactically terms, always designate a particular type in the sense, that it is equal only to itself and different from all others. The most typical representatives of monotypes are type constants like ${\displaystyle int}$ or ${\displaystyle string}$. Types can be parametric like ${\displaystyle Map\ (Set\ string)\ int}$. All these types are examples of applications of type functions ${\displaystyle D}$, i.e. ${\displaystyle \left\{int^{0},string^{0},Map^{2},Set^{1}\right\}\subset D}$ in the before mentioned examples, where the superscript indicates the number of type parameters. While the choice of ${\displaystyle D}$ is completely arbitrary, in context of HM it must contain at least ${\displaystyle \rightarrow ^{2}}$, the type of functions, which is written infix for convenience, e.g. a function mapping integers to strings has type ${\displaystyle int\rightarrow string}$. [note 2]

Type variables are monotypes. Standing alone, a type variable ${\displaystyle \alpha }$ is meant to be as concrete as ${\displaystyle int}$ or ${\displaystyle \beta }$ and clearly different from both. Type variables occurring as monotypes behave as if they were type constants, of which one only does not have any further information. Correspondingly, a function typed ${\displaystyle \alpha \rightarrow \alpha }$ only maps values of the particular type ${\displaystyle \alpha }$ on itself. Such a function can only be applied to values having type ${\displaystyle \alpha }$ and to no others.

A function with polytype ${\displaystyle \forall \alpha .\alpha \rightarrow \alpha }$ by contrast can map any value of the same type to itself, and the identity function is a value for this type. As another example ${\displaystyle \forall \alpha .(Set\ \alpha )\rightarrow int}$ is the type of a function mapping all finite sets to integers. The count of members is a value for this type. Note that qualifiers can only appear top level, i.e. a type ${\displaystyle \forall \alpha .\alpha \rightarrow \forall \alpha .\alpha }$ for instance, is excluded by syntax of types and that monotypes are included in the polytypes, thus a type has the general form ${\displaystyle \forall \alpha _{1}\dots \forall \alpha _{n}.\tau }$.

### Free type variables

In a type ${\displaystyle \forall \alpha _{1}\dots \forall \alpha _{n}.\tau }$, the symbol ${\displaystyle \forall }$ is the qualifier binding the type variables ${\displaystyle \alpha _{i}}$ in the monotype ${\displaystyle \tau }$. The variables ${\displaystyle \alpha _{i}}$ are called qualified and any occurrence of a qualified type variable in ${\displaystyle \tau }$ is called bound and all unbound type variables in ${\displaystyle \tau }$ are called free. Like in the lambda calculus, the notion of free and bound variables are essential for the understanding of the meaning of types.

This is certainly the hardest part of HM, perhaps because polytypes containing free variables are not represented in programming languages like Haskell. Likely, one does not have clauses with free variables in Prolog clauses. In particular developers experienced with both languages and actually knowing all the prerequisites of HM, are likely to slip this point. In Haskell for example, all type variables implicitly occur qualified, i.e. a Haskell type ${\displaystyle {\texttt {a->a}}}$ means ${\displaystyle \forall \alpha .\alpha \rightarrow \alpha }$ here. Because a type like ${\displaystyle \alpha \rightarrow \alpha }$, though it may practically occur in a Haskell program, cannot be expressed there, it is easily be confused with its qualified version.

So what function can have a type like e.g. ${\displaystyle \forall \beta .\beta \rightarrow \alpha }$, i.e. a mixture of both bound and unbound type variables and what could the free type variable ${\displaystyle \alpha }$ therein mean?

Consider ${\displaystyle foo}$ in Example 1, with type annotations in brackets. Its parameter ${\displaystyle y}$ is not used in the body, but the variable ${\displaystyle x}$ bound in the outer context of ${\displaystyle foo}$ surely is. As a consequence, ${\displaystyle foo}$ accepts every value as argument, while returning a value bound outside and with it its type. ${\displaystyle bar}$ to the contrary has type ${\displaystyle \forall \alpha .\forall \beta .\alpha \rightarrow (\beta \rightarrow \alpha )}$, in which all occurring type variables are bound Evaluating, for instance ${\displaystyle {\mathit {bar}}\ 1}$, results in a function of type ${\displaystyle \forall \beta .\beta \rightarrow \ {\mathit {int}}}$, perfectly reflecting that foo's monotype ${\displaystyle \alpha }$ in ${\displaystyle \forall \beta .\beta \rightarrow \alpha }$ has been refined by this call.

In this example, the free monotype variable ${\displaystyle \alpha }$ in foo's type becomes meaningful by being qualified in the outer scope, namely in bar's type. I.e. in context of the example, the same type variable ${\displaystyle \alpha }$ appears both bound and free in different types. As a consequence, a free type variable cannot be interpreted better than stating it is a monotype without knowing the context. Turning the statement around, in general, a typing is not meaningful without a context.

### Context and typing

Consequently, to get the yet disjoint parts of the syntax, expressions and types together meaningfully, a third part, the context is needed. Syntactically, it is a list of pairs ${\displaystyle x:\sigma }$, called assignments or assumptions, stating for each value variable ${\displaystyle x_{i}}$ therein a type ${\displaystyle \sigma _{i}}$. All three parts combined gives a typing of the form ${\displaystyle \Gamma \ \vdash \ e:\sigma }$, stating, that under assumptions ${\displaystyle \Gamma }$, the expression ${\displaystyle e}$ has type ${\displaystyle \sigma }$.

Now having the complete syntax at hand, one can finally make a meaningful statement about the type of ${\displaystyle foo}$ in example 1, above, namely ${\displaystyle x:\alpha \vdash \lambda \ y.x:\forall \beta .\beta \rightarrow \alpha }$. Contrary to the above formulations, the monotype variable ${\displaystyle \alpha }$ no longer appears unbound, i.e. meaningless, but bound in the context as the type of the value variable ${\displaystyle x}$. The circumstance whether a type variable is bound or free in the context apparently plays a significant role for a type as part of a typing, so ${\displaystyle free(\ \Gamma \ )}$ it is made precise in the side box.

### Note on expressiveness

Since the expression syntax might appear far too inexpressive to readers unfamiliar with the lambda calculus, and because the examples given below will likely support this misconception, some notes that the HM is not dealing with toy languages might be helpful. As a central result in research on computability, the expression syntax defined above (without the let-variant) is able to express any computable function. Moreover all other programming language constructions can be relatively directly transformed syntactically into expressions of the lambda calculus. Therefore, this simple expression is used as a model for programming languages in research. A method known to work well for the lambda calculus can easily be extended to all or at least many other syntactical construction of a particular programming language using the before mentioned syntactical transformations.

As an example, the additional expression variant ${\displaystyle {\textbf {let}}\ x=e_{1}\ {\textbf {in}}\ e_{2}}$ can be transformed to ${\displaystyle (\lambda x.e_{2})\ e_{1}}$. It is added to expression syntax in HM only to support generalization during the type inference and not because syntax lacks computational strength. Thus HM deals with inference of types in programs in general and the various functional languages using this method demonstrate, how well a result formulated only for the syntax of the lambda calculus can be extend to syntactically complex languages.

Contrary to the impression, that the expressions might be too inexpressive for practical application, they are actually far too expressive to be meaningfully typed at all. This is a consequence of the decision problem being undecidable for anything as expressive as the expression of the lambda calculus. Consequently, computing typings is a hopeless venture in general. Depending on the nature of the type system, it will either never terminate or otherwise refuse to work.

HM belongs to the later group of type systems. A collapse of the type system presents itself then as more subtle situation in that suddenly only one and the same type is yielded for the expressions of interest. This is not a fault in HM, but inherent in the problem of typing itself and can easily be created within any strongly typed programming language e.g. by coding an evaluator (the universal function) for the "too simple" expression. One then has a single concrete type that represents the universal data type as usual in untyped languages. The type system of the host programming language is then collapsed and cannot longer differentiate between the various types of values handed to or produced by the evaluator. In this context, it still delivers or checks types, but always the same, just as if the type system were not longer present at all.

## Polymorphic type order

While the equality of monotypes is purely syntactical, polytypes offer a richer structure by being related to other types through a specialization relation ${\displaystyle \sigma \sqsubseteq \sigma '}$ expressing that ${\displaystyle \sigma '}$ is more special than ${\displaystyle \sigma }$.

When being applied to a value a polymorphic function has to change its shape specializing to deal with this particular type of values. During this process, it also changes its type to match that of the parameter. If for instance the identity function having type ${\displaystyle \forall \alpha .\alpha \rightarrow \alpha }$ is to be applied on a number having type ${\displaystyle int}$, both simply cannot work together, because all the types are different and nothing fits. What is needed is a function of type ${\displaystyle int\rightarrow int}$. Thus, during application, the polymorphic identity is specialized to a monomorphic version of itself. In terms of the specialization relation, one writes ${\displaystyle \forall \alpha .\alpha \rightarrow \alpha \sqsubseteq \ int\rightarrow int}$

Now the shape shifting of polymorphic values is not fully arbitrary but rather limited by their pristine polytype. Following what has happened in the example one could paraphrase the rule of specialization, saying, a polymorphic type ${\displaystyle \forall \alpha .\tau }$ is specialized by consistently replacing each occurrence of ${\displaystyle \alpha }$ in ${\displaystyle \tau }$ and dropping the qualifier. While this rule works well for any monotype uses as replacement, it fails when a polytype, say ${\displaystyle \forall \beta .\beta }$ is tried as a replacement, resulting in the non-syntactical type ${\displaystyle \forall \beta .\beta \rightarrow \forall \beta .\beta }$. But not only that. Even if a type with nested qualified types would be allowed in the syntax, the result of the substitution would not longer preserve the property of the pristine type, in which both the parameter and the result of the function have the same type, which are now only seemingly equal because both subtypes became independent from each other allowing to specialize the parameter and the result with different types resulting in, e.g. ${\displaystyle string\rightarrow Set\ int}$, hardly the right task for an identity function.

The syntactic restriction to allow qualification only top-level is imposed to prevent generalization while specializing. Instead of ${\displaystyle \forall \beta .\beta \rightarrow \forall \beta .\beta }$, the more special type ${\displaystyle \forall \beta .\beta \rightarrow \beta }$ must be produced in this case.

One could undo the former specialization by specializing on some value of type ${\displaystyle \forall \alpha .\alpha }$ again. In terms of the relation one gains ${\displaystyle \forall \alpha .\alpha \rightarrow \alpha \sqsubseteq \forall \beta .\beta \rightarrow \beta \sqsubseteq \forall \alpha .\alpha \rightarrow \alpha }$ as a summary, meaning that syntactically different polytypes are equal w.r.t. to renaming their qualified variables.

Now focusing only on the question whether a type is more special than another and not longer what the specialized type is used for, one could summarize the specialization as in the box above. Paraphrasing it clockwise, a type ${\displaystyle \forall \alpha _{1}\dots \forall \alpha _{n}.\tau }$ is specialized by consistently replacing any of the qualified variables ${\displaystyle \alpha _{i}}$ by arbitrary monotypes ${\displaystyle \tau _{i}}$ gaining a monotype ${\displaystyle \tau '}$. Finally, type variables in ${\displaystyle \tau '}$ not occurring free in the pristine type can optionally be qualified.

Thus the specialization rules makes sure that no free variable, i.e. monotype in the pristine type becomes unintentionally bound by a qualifier, but originally qualified variable can be replaced with whatever, even with types introducing new qualified or unqualified type variables.

Starting with a polytype ${\displaystyle \forall \alpha .\alpha }$, the specialization could either replace the body by another qualified variable, actually a rename or by some type constant (including the function type) which may or may not have parameters filled either with monotypes or qualified type variables. Once a qualified variable is replaced by a type application, this specialization cannot be undone through another substitution as it was possible for qualified variables. Thus the type application is there to stay. Only if it contains another qualified type variable, the specialization could continue further replacing for it.

So the specialization introduces no further equivalence on polytype beside the already known renaming. Polytypes are syntactically equal up to renaming their qualified variables. The equality of types is a reflexive, antisymmetric and transitive relation and the remaining specializations of polytypes are transitive and with this the relation ${\displaystyle \sqsubseteq }$ is an order.

## Deductive system

The syntax of HM is carried forward to the syntax of the inference rules that form the body of the formal system, by using the typings as judgments. Each of the rules define what conclusion could be drawn from what premises. Additionally to the judgments, some extra conditions introduced above might be used as premises, too.

A proof using the rules is a sequence of judgments such that all premises are listed before a conclusion. Please see the Examples 2, 3 below for a possible format of proofs. From left to right, each line shows the conclusion, the ${\displaystyle [{\texttt {Name}}]}$ of the rule applied and the premises, either by referring to an earlier line (number) if the premise is a judgment or by making the predicate explicit.

### Typing rules

The side box shows the deduction rules of the HM type system. One can roughly divide them into two groups:

The first four rules ${\displaystyle [{\texttt {Var}}]}$, ${\displaystyle [{\texttt {App}}]}$, ${\displaystyle [{\texttt {Abs}}]}$ and ${\displaystyle [{\texttt {Let}}]}$ are centered around the syntax, presenting one rule for each of the expression forms. Their meaning is pretty obvious at the first glance, as they decompose each expression, prove their sub-expressions and finally combine the individual types found in the premises to the type in the conclusion.

The second group is formed by the remaining two rules ${\displaystyle [{\texttt {Inst}}]}$ and ${\displaystyle [{\texttt {Gen}}]}$. They handle specialization and generalization of types. While the rule ${\displaystyle [{\texttt {Inst}}]}$ should be clear from the section on specialization above, ${\displaystyle [{\texttt {Gen}}]}$ complements the former, working in the opposite direction. It allow generalization, i.e. to qualify monotype variables that are not bound in the context. The necessity of this restriction ${\displaystyle \alpha \not \in free(\ \Gamma \ )}$ is introduced the section on free type variables.

The following two examples exercise the rule system in action

${\displaystyle {\begin{array}{llll}1:&\Gamma \vdash id:\forall \alpha .\alpha \rightarrow \alpha &[{\texttt {Var}}]&(id:\forall \alpha .\alpha \rightarrow \alpha \in \Gamma )\\2:&\Gamma \vdash id:int\rightarrow int&[{\texttt {Inst}}]&(1),\ (\forall \alpha .\alpha \rightarrow \alpha \sqsubseteq int\rightarrow int)\\3:&\Gamma \vdash n:int&[{\texttt {Var}}]&(n:int\in \Gamma )\\4:&\Gamma \vdash id(n):int&[{\texttt {App}}]&(2),\ (3)\\\end{array}}}$

Example 3: To demonstrate generalization, ${\displaystyle \vdash \ {\textbf {let}}\,id=\lambda x.x\ {\textbf {in}}\ id\,:\,\forall \alpha .\alpha \rightarrow \alpha }$ is shown below:

${\displaystyle {\begin{array}{llll}1:&x:\alpha \vdash x:\alpha &[{\texttt {Var}}]&(x:\alpha \in \left\{x:\alpha \right\})\\2:&\vdash \lambda x.x:\alpha \rightarrow \alpha &[{\texttt {Abs}}]&(1)\\3:&\vdash \lambda x.x:\forall \alpha .\alpha \rightarrow \alpha &[{\texttt {Gen}}]&(2),\ (\alpha \not \in free(\epsilon ))\\4:&id:\lambda \alpha .\alpha \rightarrow \alpha \vdash id:\lambda \alpha .\alpha \rightarrow \alpha &[{\texttt {Var}}]&(id:\lambda \alpha .\alpha \rightarrow \alpha \in \left\{id:\lambda \alpha .\alpha \rightarrow \alpha \right\})\\5:&\vdash {\textbf {let}}\,id=\lambda x.x\ {\textbf {in}}\ id\,:\,\forall \alpha .\alpha \rightarrow \alpha &[{\texttt {Let}}]&(3),\ (4)\\\end{array}}}$

### Principal type

As mentioned in the introduction, the rules allow to deduce different types for one and the same expression. See for instance, Example 2, steps 1,2 and Example 3, steps 2,3 for three different typings of the same expression. Clearly, the different results are not fully unrelated, but connected by the type order. It is an important property of the rule system and this order that whenever more but one type can be deduced for an expression, among them is (modulo alpha-renaming of the type variables) a unique most general type in the sense, that all others are specialization of it. Though the rule system must allow to derive specialized types, a type inference algorithm should deliver this most general or principal type as its result.

### Let-polymorphism

Not visible immediately, the rule set encodes a regulation under which circumstances a type might be generalized or not by a slightly varying use of mono- and polytypes in the rules ${\displaystyle [{\texttt {Abs}}]}$ and ${\displaystyle [{\texttt {Let}}]}$.

In rule ${\displaystyle [{\texttt {Abs}}]}$, the value variable of the parameter of the function ${\displaystyle \lambda x.e}$ is added to the context with a monomorphic type through the premise ${\displaystyle \Gamma ,\ x:\tau \vdash e:\tau '}$, while in the rule ${\displaystyle [{\texttt {Let}}]}$, the variable enters the environment in polymorphic form ${\displaystyle \Gamma ,\ x:\sigma \vdash e_{1}:\tau '}$. Though in both cases the presence of x in the context prevents the use of the generalisation rule for any monotype variable in the assignment, this regulation forces the parameter x in a ${\displaystyle \lambda }$-expression to remain monomorphic, while in a let-expression, the variable could already be introduced polymorphic, making specializations possible.

As a consequence of this regulation, no type can be inferred for ${\displaystyle \lambda f.(f\,{\textrm {true}},f\,{\textrm {0}})}$ since the parameter ${\displaystyle f}$ is in a monomorphic position, while ${\displaystyle {\textbf {let}}\ f=\lambda x.x\,{\textbf {in}}\,(f\,{\textrm {true}},f\,{\textrm {0}})}$ yields a type ${\displaystyle (bool,int)}$, because ${\displaystyle f}$ has been introduced in a let-expression and is treated polymorphic therefore. Note that this behaviour is in strong contrast to the usual definition ${\displaystyle {\textbf {let}}\ x=e_{1}\ {\textbf {in}}\ e_{2}\ ::=(\lambda \ x.e_{2})\ e_{1}}$ and the reason why the let-expression appears in the syntax at all. This distinction is called let-polymorphism or let generalization and is a conception owed to HM.

## Towards an algorithm

Now that the deduction system of HM is at hand, one could present an algorithm and validate it w.r.t. the rules. Alternatively, it might be possible to derive it by taking a closer look on how the rules interact and proof are formed. This is done in the remainder of this article focusing on the possible decisions one can make while proving a typing.

### Degrees of freedom choosing the rules

Isolating the points in a proof, where no decision is possible at all, the first group of rules centered around the syntax leaves no choice since to each syntactical rule corresponds a unique typing rule, which determines a part of the proof, while between the conclusion and the premises of these fixed parts chains of ${\displaystyle [{\texttt {Inst}}]}$ and ${\displaystyle [{\texttt {Gen}}]}$ could occur. Such a chain could also exist between the conclusion of the proof and the rule for topmost expression. All proof must have the so sketched shape.

Because the only choice in a proof with respect of rule selection are the ${\displaystyle [{\texttt {Inst}}]}$ and ${\displaystyle [{\texttt {Gen}}]}$ chains, the form of the proof suggests the question whether it can be made more precise, where these chains might be needed. This is in fact possible and leads to a variant of the rules system with no such rules.

### Syntax-directed rule system

A contemporary treatment of HM uses a purely syntax-directed rule system due to Clement[5] as an intermediate step. In this system, the specialization is located directly after the original ${\displaystyle [{\texttt {Var}}]}$ rule and merged into it, while the generalization becomes part of the ${\displaystyle [{\texttt {Let}}]}$ rule. There the generalization is also determined to always produce the most general type by introducing the function ${\displaystyle {\bar {\Gamma }}(\tau )}$, which qualifies all monotype variables not bound in ${\displaystyle \Gamma }$.

Formally, to validate, that this new rule system ${\displaystyle \vdash _{S}}$ is equivalent to the original ${\displaystyle \vdash _{D}}$, one has to show that ${\displaystyle \Gamma \vdash _{D}\ e:\sigma \Leftrightarrow \Gamma \vdash _{S}\ e:\sigma }$, which falls apart into two sub-proofs:

While consistency can be seen by decomposing the rules ${\displaystyle [{\texttt {Let}}]}$ and ${\displaystyle [{\texttt {Var}}]}$ of ${\displaystyle \vdash _{S}}$ into proofs in ${\displaystyle \vdash _{D}}$, it is likely visible that ${\displaystyle \vdash _{S}}$ is incomplete, as one cannot show ${\displaystyle \lambda \ x.x:\forall \alpha .\alpha \rightarrow \alpha }$ in ${\displaystyle \vdash _{S}}$, for instance, but only ${\displaystyle \lambda \ x.x:\alpha \rightarrow \alpha }$. An only slightly weaker version of completeness is provable [6] though, namely

implying, one can derive the principal type for an expression in ${\displaystyle \vdash _{S}}$ allowing to generalize the proof in the end.

Comparing ${\displaystyle \vdash _{D}}$ and ${\displaystyle \vdash _{S}}$ note that only monotypes appear in the judgments of all rules, now.

### Degrees of freedom instantiating the rules

Within the rules themselves, assuming a given expression, one is free to pick the instances for (rule) variables not occurring in this expression. These are the instances for the type variable in the rules. Working towards finding the most general type, this choice can be limited to picking suitable types for ${\displaystyle \tau }$ in ${\displaystyle [{\texttt {Var}}]}$ and ${\displaystyle [{\texttt {Abs}}]}$. The decision of a suitable choice cannot be made locally, but its quality becomes apparent in the premises of ${\displaystyle [{\texttt {App}}]}$, the only rule, in which two different types, namely the function's formal and actual parameter type have to come together as one.

Therefore, the general strategy for finding a proof would be to make the most general assumption (${\displaystyle \alpha \not \in free(\Gamma )}$) for ${\displaystyle \tau }$ in ${\displaystyle [{\texttt {Abs}}]}$ and to refine this and the choice to be made in ${\displaystyle [{\texttt {Var}}]}$ until all side conditions imposed by the ${\displaystyle [{\texttt {App}}]}$ rules are finally met. Fortunately, no trial and error is needed, since an effective method is known to compute all the choices, Robinson's Unification in combination with the so-called Union-Find algorithm.

To briefly summarize the union-find algorithm, given the set of all types in a proof, it allows one to group them together into equivalence classes by means of a ${\displaystyle {\texttt {union}}}$ procedure and to pick a representative for each such class using a ${\displaystyle {\texttt {find}}}$ procedure. Emphasizing on the word procedure in the sense of side effect, we're clearly leaving the realm of logic to prepare an effective algorithm. The representative of a ${\displaystyle {\texttt {union}}(a,b)}$ is determined such, that if both ${\displaystyle a}$ and ${\displaystyle b}$ are type variables the representative is arbitrarily one of them, while uniting a variable and a term, the term becomes the representative. Assuming an implementation of union-find at hand, one can formulate the unification of two monotypes as follows:

unify(ta,tb):
ta = find(ta)
tb = find(tb)
if both ta,tb are terms of the form D p1..pn with identical D,n then
unify(ta[i],tb[i]) for each corresponding ith parameter
else
if at least one of ta,tb is a type variable then
union(ta,tb)
else
error 'types do not match'


## Algorithm W

The presentation of Algorithm W as shown in the side box does not only deviate significantly from the original[4] but is also a gross abuse of the notation of logical rules, since it includes side effects. It is legitimized here, for allowing a direct comparison with ${\displaystyle \vdash _{S}}$ while expressing an efficient implementation at the same time. The rules now specify a procedure with parameters ${\displaystyle \Gamma ,e}$ yielding ${\displaystyle \tau }$ in the conclusion where the execution of the premises proceeds from left to right. Alternatively to a procedure, it could be viewed as an attributation of the expression.

The procedure '${\displaystyle inst(\sigma )}$' specializes the polytype ${\displaystyle \sigma }$ by copying the term and replacing the bound type variables consistently by new monotype variables. '${\displaystyle newvar}$' produces a new monotype variable. Likely, ${\displaystyle {\bar {\Gamma }}(\tau )}$ has to copy the type introducing new variables for the qualification to avoid unwanted captures. Overall, the algorithm now proceeds by always making the most general choice leaving the specialization to the unification, which by itself produces the most general result. As noted above, the final result ${\displaystyle \tau }$ has to be generalized to ${\displaystyle {\bar {\Gamma }}(\tau )}$ in the end, to gain the most general type for a given expression.

Because the procedures used in the algorithm have near O(1) cost, the overall cost of the algorithm is close linear to the size of the expression for which a type is to be inferred. This is in strong contrast to many other attempts to derive type inference algorithms, which often came out to be NP-hard, if not undecidable w.r.t. termination. Thus the HM performs as good as the best fully informed type-checking algorithms can. Type-checking here means, that an algorithm does not have to find a proof, but only to validate a given one.

The efficiency is slightly lowered for two reasons. First, the binding of type variables in the context has to be maintained to allow computation of ${\displaystyle {\bar {\Gamma }}(\tau )}$ and an occurs check has to made to prevent the building of recursive types during ${\displaystyle union(\alpha ,\tau )}$. An example of such a case is ${\displaystyle \lambda \ x.(x\ x)}$, for which no type can be derived using HM. Because practically types are only small terms and do not build up expanding structures, one can treat them in complexity analysis as being smaller as some constant, retaining O(1) costs.

### Original presentation of Algorithm W

In the original paper,[4] the algorithm is presented more formally using a substitution style instead of side effects in the method above. In the later form, the side effect invisibly takes care of all places where a type variable is used. Explicitly using substitutions not only makes the algorithm hard to read, because the side effect occurs virtually everywhere, but also gives the false impression that the method might be costly. When implemented using purely functional means or for the purpose of proving the algorithm to be basically equivalent to the deduction system, full explicitness is of course needed and the original formulation a necessary refinement.

## Further topics

### Recursive definitions

A central property of the lambda calculus is, that recursive definitions are non-elemental, but can instead be expressed by a fixed point combinator. The original paper[4] notes that recursion can realized by this combinator's type ${\displaystyle {\mathit {fix}}:\forall \alpha .(\alpha \rightarrow \alpha )\rightarrow \alpha }$. A possible recursive definitions could thus be formulated as ${\displaystyle {\texttt {rec}}\ v=e_{1}\ {\texttt {in}}\ e_{2}\ ::={\texttt {let}}\ v={\mathit {fix}}(\lambda v.e_{1})\ {\texttt {in}}\ e_{2}}$.

Alternatively an extension of the expression syntax and an extra typing rule is possible as:

${\displaystyle \displaystyle {\frac {\Gamma ,\Gamma '\vdash e_{1}:\tau _{1}\quad \dots \quad \Gamma ,\Gamma '\vdash e_{n}:\tau _{n}\quad \Gamma ,\Gamma ''\vdash e:\tau }{\Gamma \ \vdash \ {\texttt {rec}}\ v_{1}=e_{1}\ {\texttt {and}}\ \dots \ {\texttt {and}}\ v_{n}=e_{n}\ {\texttt {in}}\ e:\tau }}\quad [{\texttt {Rec}}]}$

where

basically merging ${\displaystyle [{\texttt {Abs}}]}$ and ${\displaystyle [{\texttt {Let}}]}$ while including the recursively defined variables in monotype positions where they occur left to the ${\displaystyle {\texttt {in}}}$ but as polytypes right to it. This formulation perhaps best summarizes the essence of let-polymorphism.

## Notes

1. Polytypes are called "type schemes" in the original article.
2. The parametric types ${\displaystyle D\ \tau \dots \tau }$ were not present in the original paper on HM and are not needed to present the method. None of the inference rules below will take care or even note them. The same hold for the non-parametric "primitive types" in said paper. All the machinery for polymorphic type inference can be defined without them. They have been included here for sake of the examples but also because the nature of HM is all about parametric types. This comes from the function type ${\displaystyle \tau \rightarrow \tau }$, hard-wired in the inference rules, below, which already has two parameters and have been presented here as only a special case.

## References

1. R. Hindley, (1969) "The Principal Type-Scheme of an Object in Combinatory Logic", Transactions of the American Mathematical Society, Vol. 146, pp. 29–60 [1]
2. Milner, (1978) "A Theory of Type Polymorphism in Programming". Journal of Computer and System Science (JCSS) 17, pp. 348–374[2]
3. Luis Damas (1985): Type Assignment in Programming Languages. PhD thesis, University of Edinburgh (CST-33-85)
4. Damas, Milner (1982), "Principal type-schemes for functional programs". 9th Symposium on Principles of programming languages (POPL'82) pp. 207–212, ACM: [3]
5. Clement, (1987). The Natural Dynamic Semantics of Mini-Standard ML. TAPSOFT'87, Vol 2. LNCS, Vol. 250, pp 67–81
6. Jeff Vaughan. A proof of correctness for the Hindley–Milner type inference algorithm.[4]