Unlambda abstraction elimination (C Plus Plus)

From LiteratePrograms
Jump to: navigation, search
Other implementations: C++ | Sed

This program does Unlambda abstraction elimination. Lambda syntax is the same as on the Unlambda home page, except that the dollar signs are omitted. That is, instead of e.g. ^x`$xi you just write ^x`xi (which results in ``si`ki). Currently, whitespace is not allowed in the expression.

Contents

[edit] Checking the validity of the Unlambda + lambda expression

Before the program does the abstraction elimination, it first checks that the expression is indeed a valid input expression, that is, if it indeed contains only lambda terms and valid Unlambda expressions.

[edit] Checking the validity of partial expressions

Since the arguments to lambdas are always functions (because there's nothing else but functions in Unlambda), built-in single-letter functions and lambda arguments can be treated exactly the same inside the lambda function body. To exploit this fact, the actual work is done by a recursive function, valid_with, which gets passed the function symbols which are currently valid, as well as the start index of the expression. On successful validation, the function returns true, and the index argument is updated to the first index after the fully validated expression. If the expression is invalid, the function returns false.

<<check validity>>=
// test if the partial expression starting at index is valid Unlambda
// + lambda expressions if the valid functions are the one-character
// functions given in the second argument plus the special functions
// .x and ?x. The index is updated to the end of the tested
// expression.
bool valid_with(std::string const& expression,
                std::string const& functions,
                std::string::size_type& index)
{
  valid_with function body
}

There are three characters in Unlambda with special meaning: The application operator (backtick, `) and the two starters for two-letter functions (the dot for character output and the question mark for input character testing). The unlambda-lambda notation adds a fourth one, the caret, for introducing a lambda expression. Those must be excluded as lambda argument names (while it's unlikely that anyone would use a non-letter as argument name, there's no need to forbid that in general), in addition to the already used one-letter function names. The following string constant contains those special characters:

<<valid_with function body>>=
static std::string const special = "`.?^";

As described on the Unlambda homepage, the validity of an Unlambda program can be checked quite easily by counting the backticks (function applications) and function names. If you start a counter with 1, increment it for every backtick, and decrement it for every function application, then you have a valid complete Unlambda expression as soon as your counter reaches zero, the stoplevel. The variable levels is exactly this counter. Processing goes on as long as the level is above the stoplevel, and there's something left to process (i.e. the index didn't reach the end of the string).

<<valid_with function body>>=
int const stoplevel = 0;

int level = 1;
while (level > stoplevel && index < expression.size())
{
  valid_with loop body
}

We have a valid expression if we have reached the stoplevel:

<<valid_with function body>>=
return level == stoplevel;

In the loop, the validity of the expression is checked unit-by-unit, and the index and level are updated accordingly. The units are

  • for two-letter functions, the two letters making up that function,
  • for lambda expressions, the complete expression,
  • in all other cases one letter

Since the backtick has to be treated differently for the level, we get four different cases:

<<valid_with loop body>>=
switch (char c = expression[index])
{
case '`':
  {
    handle backtick
    break;
  }
case '.':
case '?':
  {
    handle two-letter functions
    break;
  }
case '^': // lambda
  {
    handle lambda
    break;
  }
default:
  {
    handle other functions
    break;
  }
}

Handling a backtick is simple: We just increment both the level and the current index by one.

<<handle backtick>>=
++level;
++index;

Handling one-letter functions isn't too hard either: We first check that we have a valid function name, and immediatly return false if we don't (since we've already found an error, any further checking would be unnecessary). If we found a valid function, we decrement the level, and increment the index.

<<handle other functions>>=
if (functions.find(c) == std::string::npos)
  return false;
--level;
++index;

Handling two-letter functions isn't too hard either. The only complication is that we have take care that we didn't reach the end of the string with the first letter; in that case we certainly have an invalid expression. In that case, we immediatly return false; otherwise we just skip over the next character and decrease the level.

<<handle two-letter functions>>=
++index;
if (index >= expression.size())
  return false;
--level;
++index;

The only really interesting case is checking the lambda functions. In a first step, we get the function argument name, which directly follows the caret. Of course we first must make sure that it actually exists.

<<handle lambda>>=
++index;
if (index >= expression.size())
  return false;
char const arg = expression[index];

Next we check that it's a valid function name which isn't yet taken.

<<handle lambda>>=
// the argument may not be a valid function name or special symbol
if ((functions + special).find(arg) != std::string::npos)
  return false;

Then we check that what follows is again a valid Unlambda+lambda expression, except that now our argument is also an allowed function. For this we call the function valid_with recursively starting with the next position, and return false if the check fails. Note that we don't have to check for reaching the end here, because the recursively called valid_with will not enter the loop in that case, and therefore simply return false without even touching the passed expression string.

<<handle lambda>>=
++index;
if (!valid_with(expression, functions + arg, index))
  return false;

Since a lambda is just a function as well, we finally decrement the level. Note that we do not have to update the index, because the recursive call already has done that.

<<handle lambda>>=
--level;

[edit] Checking the complete Unlambda+lambda expression

A string contains a valid Unlambda+lambda expression exactly if a valid expression starts at the beginning and reaches the end.

<<check validity>>=
// test if the expression is valid Unlambda + lambda expression
bool valid(std::string const& expression)
{
  std::string::size_type index = 0;
  // the second string contains all elementary Unlambda 2.0 functions
  // except for the two-character functions .x and ?x (those need
  // special handling)
  return valid_with(expression, "skivrdce|", index)
    && index == expression.size();
}

[edit] Doing the abstraction elimination

The function eliminate_abstraction is responsible for abstraction elimination. It takes advantage of the fact that the validity of the expression is already verified. Therefore certain indexes don't have to be checked, because if they were not valid, nor would be the expression.

The function takes the Unlambda+lambda expression as argument and returns the corresponding Unlambda expression.

<<abstraction elimination main function>>=
// eliminate abstraction (assumes a valid expression)
std::string eliminate_abstraction(std::string expression)
{
  eliminate_abstraction function body
}

As described on the Unlambda homepage, lambda expression have to be eliminated from the inside out, i.e. if a lambda expression contains another lambda expression, abstraction elimination has to be done on the inner lambda expression first. Since the lambda symbol (^ in our case) is at the start of the lambda expression, this condition can easily fulfilled by simply always eliminating the lambda expression which starts last, that is, the one starting at the last caret:

<<eliminate_abstraction function body>>=
std::string::size_type pos;
while ((pos = expression.rfind('^')) != std::string::npos)
{
  eliminate_abstraction main loop body
}
return expression;

A lambda has always the syntax

^ argument-name lambda-body

where the argument name is just a single letter. The first character of the lambda body determines what has to be done.

<<eliminate_abstraction main loop body>>=
char const argument_name = expression[pos+1];
char const what = expression[pos+2];
if (what == argument_name)
{
  lambda body is argument name
}
else if (what == '`')
{
  lambda body is function application
}
else if (what == 'v')
{
  lambda body is v function
}
else
{
  lambda body is other function
}

[edit] Handling of non-applications

If the lambda body is again the argument name, the lambda expression is just the identity function, i:

<<lambda body is argument name>>=
expression.replace(pos, 3, "i");

If the lambda body is some other function, we have to "quote" it with `k:

<<lambda body is other function>>=
expression.replace(pos, 2, "`k");

The exception to this rule is the v function, because `kv and v are really the same function: They ignore their argument and return v. Thus v can be abstracted to itself:

<<lambda body is v function>>=
expression.erase(pos, 2);

[edit] Handling of function applications

Handling function applications is more complicated, because we want all the shortcuts described on the Unlambda homepage to be applied, so our expressions don't get too large. The first step is to determine the applied function and its argument. The applied function starts at pos+3, its argument directly follows. The function find_end gives the first index after the Unlambda subexpression starting at the position given as second argument. Note that since we look at the last lambda expression, those subexpressions are valid unlambda expressions, except that the lambda argument may be used as function name.

<<lambda body is function application>>=
std::string::size_type
  end1 = find_end(expression, pos+3),
  end2 = find_end(expression, end1);

The function find_end simply implements the algorithm described on the lambda homepage for checking the validity of Unlambda expressions, i.e. start with 1, increment for every application operator, decrement for every function name. However since we already know that we have a valid expression (we checkedd it earlier), we take reaching zero as indication that we reached the end of the subexpression.

Note that we do not have to handle lambdas, because all lambdas in the part we are operating on already have been expanded. Two-character functions get a special handling by simply skipping the second character.

<<abstraction elimination helper functions>>=
// find the end of a subexpression (assumes a valid Unlambda
// expression starting at pos)
std::string::size_type find_end(std::string const& expr,
                                std::string::size_type pos)
{
  int level = 1;
  while (level > 0)
  {
    switch(expr[pos])
    {
    case '`':
      ++level;
      break;
    case '.':
    case '?':
      ++pos;
      // fall through
    default:
      --level;
    }
    ++pos;
  }
  return pos;
}

[edit] The constant shortcut

Now we check whether any shortcuts can be used. The first one is the constant shortcut, which allows to simply "quote" the whole lambda body with `k.

<<lambda body is function application>>=
if (safe_for_constant_shortcut(expression, pos+2, end2, argument_name))
{
  expression.replace(pos, 2, "`k");
}

The actual check is done in the function safe_for_constant_shortcut. It takes the lambda body (in form of the complete expression and the start and end index of the lambda body), and the name of the lambda argument.

The function implements the check described on the unlambda homepage, except that i and v are also accepted (because they can be implemented using s and k alone).

The idea is to go through the string, count the application operators in front of each function name, and return false if there are too many of them. Or in other words, you accept all functions, except the lambda argument, as long as they don't have been given all their arguments.

<<abstraction elimination helper functions>>=
bool safe_for_constant_shortcut(std::string const& expression,
                                std::string::size_type start,
                                std::string::size_type end,
                                char arg)
{
  int applications = 0;
  for (; start < end; ++start)
  {
    switch (expression[start])
    {
    case '`':
      ++applications;
      break;
    non-application cases
    }
  }
  return true;
}

The s function takes three arguments. If there are less, it's mere currying, and therefore harmless.

<<non-application cases>>=
case 's':
  {
    if (applications > 2)
      return false;
    applications = 0;
  }
  break;

The k function takes two arguments. So we are safe as long as we don't have more than one application.

<<non-application cases>>=
case 'k':
  {
    if (applications > 1)
      return false;
    applications = 0;
  }
  break;

The function i gets special handling: If applied, it just vanishes; therefore an applied i reduces the number of pending applications by one. A non-applied i does nothing and is completely harmless.

<<non-application cases>>=
case 'i':
  {
    if (applications > 1)
      --applications;
  }
  break;

Also d gets special handling: It delays evaluation of its argument; therefore if there's exactly one application and its argument doesn't contain the lambda argument, the whole argument can be skipped. If there's another application, its argument is evaluated, and therefore in that situation it acts exactly like i.

<<non-application cases>>=
case 'd':
  {
    if (applications == 1)
    {
      std::string::size_type end = find_end(expression, start+1) - 1;
      if (expression.find(arg, start+1) < end)
        return false;
      start = end-1;
    }
    else if (applications > 1)
      --applications;
  }
  break;

Since v just returns itself, regardless if its argument, applications are always safe. However, unlike with d, the safety of its argument must still be checked.

<<non-application cases>>=
case 'v':
  applications = 0;
  break;

Any other functions are not safe if they are applied; the argument itself even isn't safe if not applied. Also, for the two-character functions the second character must be skipped.

<<non-application cases>>=
default:
  if (applications > 0 || expression[start] == arg)
    return false;
  if (expression[start] == '.' || expression[start] == '?')
    ++start;
  applications = 0;

[edit] The function shortcut

The next checked shortcut is the function shortcut, i.e. replacing ^x`Fx with F. This applies only if the function argument is the lambda argument, and the function itself is safe for the shortcut.

<<lambda body is function application>>=
else if (end2 == end1+1
         && expression[end1] == argument_name
         && safe_for_function_shortcut(expression, pos+3, end1, argument_name))
{
  expression.replace(pos, end2-pos, expression.substr(pos+3,end1-pos-3)); // function call shortcut
}

Again, the safety test is done in a separate function, which receives the subexpression describing the applied function (again, through the start and end index into the complete expression) and the lambda argument name. A function is safe if it contains no side effects, no delay, and no lambda argument.

This function assumes that start is nonzero and indeed denotes the start of a function.

<<abstraction elimination helper functions>>=
bool safe_for_function_shortcut(std::string const& expression,
                                std::string::size_type start,
                                std::string::size_type end,
                                char arg)
{
  safe_for_function_shortcut function body
}

The list of unsafe functions gets stored in a string. For the output function, only the initial dot is stored.

<<safe_for_function_shortcut function body>>=
std::string unsafe_chars = std::string(".rde@|") + arg;

On the first sight, the problem looks simple: Just look whether one of the unsafe function characters occurs, and if so, return failure. However, the character can be the second character of a two-character function, where it doesn't denote a function, and the function ?X is actually safe. Therefore we have to loop through the string, checking each "offending" character if it is "quoted" by a ?, until we either found an "unquoted" one, or have checked all problematic characters.

<<safe_for_function_shortcut function body>>=
std::string::size_type pos = start;
while (pos < end)
{
  safe_for_function_shortcut loop body
}
return true; // no unquoted bad character found

First, we look for the next "offending" character. If that comes only after the end of our subexpression or doesn't occur at all (in which std::string::npos is returned, which is larger than all other possible values), we are ready.

<<safe_for_function_shortcut loop body>>=
std::string::size_type failpos = expression.find_first_of(unsafe_chars, pos);

if (failpos >= end)
  break; // no bad character found

If we have found an offending character, we have to check whether it is quoted by "?" (if it were quoted by ".", we would have found that one instead, because it's one of the unsafe functions). But it doesn't suffice to check whether there's a question mark just before, because that question mark might have been quoted by another question mark (again, we don't have to care about dot-quoting, because any unquoted dot would have been found before). Because every second question mark is quoted by the one before it, the offending character is unquoted exactly if the number of question marks preceding it is even. In that case, it is not safe to apply the function shortcut, and we return immediatly.

<<safe_for_function_shortcut loop body>>=
std::string::size_type questionpos = failpos-1;
while (questionpos >= start && expression[questionpos] == '?')
  --questionpos;
if ((failpos - questionpos) % 2 == 1) // failpos - questionpos = 1 + number of question marks
  return false;                       // an even number of question marks means the character is not quoted

If the offending character was quoted, we continue our search at the character following it.

<<safe_for_function_shortcut loop body>>=
pos = failpos+1;

[edit] Default handling of function applications

If none of the shortcuts applies, the default replacement rule ^x`FG -> ``s^xF^xG is applied:

<<lambda body is function application>>=
else
{
  expression.replace(pos, end2-pos,
                     std::string("``s^") + argument_name
                     + expression.substr(pos+3, end1-pos-3)
                     + "^" + argument_name
                     + expression.substr(end1, end2-end1));
}

Note that this produces new lambda expressions, which will be eliminated in the following iterations of the loop.

[edit] Putting things together

The helper functions come before the main function, so we don't need forward declarations.

<<eliminate abstraction>>=

abstraction elimination helper functions

abstraction elimination main function

[edit] The main program

The main program just contains a simple loop which reads an Unlambda+lambda expression, checks its validity, and if the expression is valid, does abstraction elemination and prints the result. The loop is terminated by a line containing only a single q (which isn't a valid unlambda program), or by end of file or read error.

<<uae.cc>>=
// ************************************
// *                                  *
// * Unlambda abstraction elimination *
// *                                  *
// ************************************

#include <iostream>
#include <string>

check validity

eliminate abstraction

int main()
{
  std::string expr;
  while(std::cout << "expression: "
        && std::getline(std::cin, expr)
        && expr != "q")
  {
    if (valid(expr))
      std::cout << "result: " << eliminate_abstraction(expr) << "\n\n";
    else
      std::cout << "invalid expression\n\n";
  }
}
Download code
hijacker
hijacker
hijacker
hijacker