Convert integer to words (QBASIC)

From LiteratePrograms
Jump to: navigation, search
Other implementations: Java | QBASIC

Here is the function for changing a number into words. It works for English, for US English and for French. So far it doesn't work for other languages. Also please note that my French is pretty rusty, so I don't guarantee that the current algorithm produces valid French in all cases. The assistance of a fluent French speaker to add more test cases would be very much appreciated.

The function is split into a "run once" initialisation portion followed by the part which calculates the actual words required to do the job. These portions will be discussed in a little more detail later in the article.

FUNCTION Num2Lang$ (aNumber AS LONG, aLang AS STRING)
variable declarations
initialisation call
function body
initialisation block

Two types of scope are used for the variables in this function. DIMensioned variables are automatically reinitialised to their default values whenever the function is called. STATIC variables retain their value between calls to the function. They are not strictly necessary in this function but their use makes the function faster by avoiding the need to set up the vocabulary arrays each time the function is called.

<<variable declarations>>=

   STATIC dUnits() AS STRING, dTens() AS STRING, dPowers() AS STRING

   DIM dBuffer AS STRING, dDigitGroup AS LONG, dRange AS INTEGER

Note that QBASIC doesn't allow for the sizing of a static array at the time of its declaration. This has to be done later.

Basically the vocabulary arrays are loaded with data before use on the first call to the function and then just used on subsequent calls. The only case in which these arrays will need to be reloaded is when the language used is changed.

Finally, if you have missgivings about the use of the GOSUB command in this piece of code please read on. Its use is discussed at a later point in this article.

<<initialisation call>>=
   IF dLang <> aLang THEN
      GOSUB Num2LangInit
      dLang = aLang

After the soup and salad we come to the meat of the function. In the two (and a half) languages covered so far we only really need to deal with three cases: zero to nineteen; twenty to ninety-nine; everything else. Since the specification for this task (the bottles of beer song) implies that we only need to deal with positive whole numbers and since I have arbitrarily decided that I only want to deal with 32-bit signed integers (the LONG type in BASIC) "everything else" implies any whole number between one hundred and 231-1. It would be trivial to handle negative numbers and not too much work to handle decimals but there's no need in this case. However with an eye to bugs or future expansion a fourth case has been added to handle numbers outside the range.

<<function body>>=
   SELECT CASE aNumber

first case
second case
third case
other cases

   Num2Lang = dBuffer


The first case contains a highly idiosyncratic set of numbers with little or no pattern and the easiest way to handle it is via the pre-initialised lookup table, dUnits.

<<first case>>=
   CASE 0 TO 19
      dBuffer = dUnits(INT(aNumber))

The second case is pretty straightforward for English but handling French adds two minor complications. Firstly some ten-words, "septante" for instance, don't exist. In those cases the base twenty system has to be used starting from the previous existing ten-word. Hence the first IF/ENDIF section in the following code. Secondly umpty-one values have to be treated specially by adding the word "et" in between the tens and the units. Hence the second IF/ENDIF section.

<<second case>>=
   CASE 20 TO 99
      dDigitGroup = INT(aNumber / 10)
      dBuffer = dTens(dDigitGroup)
      IF dBuffer = "" THEN
         dDigitGroup = dDigitGroup - 1
         dBuffer = dTens(dDigitGroup)
      END IF
      dDigitGroup = aNumber - dDigitGroup * 10
      IF aLang = "fr" AND dDigitGroup MOD 10 = 1 THEN
         dBuffer = dBuffer + " et " + Num2Lang(dDigitGroup, aLang)
      ELSEIF dDigitGroup > 0 THEN
         dBuffer = dBuffer + "-" + Num2Lang(dDigitGroup, aLang)
      END IF

The third case has the most complex code The basic idea is to identify which range the number falls into (thousands, millions, etc.) then use recursive calls to get the text for groups of three digits. That simple picture is clouded a little by the first range, the hundreds, which are treated a little differently in US English from other English variants. it's also complicated by French which uses the phrases "cent" and "mille" rather than "un cent" or "un mille" for 100 and 1,000 but luckily that doesn't add too much complication.

Also note the addition of .4 to the number when calculating the range. This shouldn't have been necessary but a floating point approximation error leads to the wrong value being calculated for 100 if it isn't present. The calculation worked for all other values, even without the addition but them's the breaks.

<<third case>>=
   CASE 100 TO 2147483647
      dRange = INT(LOG(aNumber + .4) / dLog10)
      IF dRange > 3 THEN dRange = INT(dRange / 3) * 3
      dDigitGroup = INT(aNumber / 10 ^ dRange)
      IF aLang = "fr" AND dDigitGroup = 1 AND dRange < 5 THEN
         dBuffer = ""
         dBuffer = Num2Lang(dDigitGroup, aLang)
      END IF
      dBuffer = LTRIM$(dBuffer + dPowers(dRange))
      dDigitGroup = aNumber - dDigitGroup * 10 ^ dRange
      IF dDigitGroup > 0 THEN
         IF dDigitGroup < 100 AND aLang = "en-uk" THEN
            dBuffer = dBuffer + " and"
         END IF
         dBuffer = dBuffer + " " + Num2Lang(dDigitGroup, aLang)
      END IF

Finally a default case was added during development to handle cases which hadn't been handled yet. If the code is extended to handle negative or floating point numbers at some time in the future this might come in handy again, so it has been left. At the moment it will catch negative numbers and produce a "reasonable" answer which will at least indicate that there is a problem in the input.

<<other cases>>=
      dBuffer = LTRIM$(STR$(aNumber))

Now we have the initialisation code for the function. It basically loads arrays with the vocabulary required for the current language. It also sets the LOG10 constant. This is required because QBASIC's built-in LOG functon deals in natural logarithms and we actually need base 10 logarithms to identify the right powers-of-ten word.

Just a word on the use of GOSUB and a label here. Many people recoil in horror from the GOSUB command nowadays with some vague fear that it is the GOTO command in disguise and that therefore its use is "unstructured". In fact it has been removed altogether from the latest incarnation of BASIC, VB.NET, and that is a pity. There is no doubt that GOSUB in the wrong hands can be misused badly. However it has at least one legitimate use and that use is the provision of structuring within a function or subroutine where the creation of extra functions or subroutines to carry out that structuring would be overkill. That is how it has been used here. While it could have been replaced altogether in this function, its use makes the code more readable than it would otherwise have been and thus its use is justified.

<<initialisation block>>=
   REDIM dUnits(19), dTens(9), dPowers(9)
   CASE "fr"
      dUnits(0) = "zero": dUnits(10) = "dix": dTens(0) = "": dPowers(0) = ""
      dUnits(1) = "un": dUnits(11) = "onze": dTens(1) = "": dPowers(1) = ""
      dUnits(2) = "deux": dUnits(12) = "douze": dTens(2) = "vingt": dPowers(2) = " cent"
      dUnits(3) = "trois": dUnits(13) = "treize": dTens(3) = "trente": dPowers(3) = " mille"
      dUnits(4) = "quatre": dUnits(14) = "quatorze": dTens(4) = "quarante": dPowers(4) = ""
      dUnits(5) = "cinq": dUnits(15) = "quinze": dTens(5) = "cinquante": dPowers(5) = ""
      dUnits(6) = "six": dUnits(16) = "seize": dTens(6) = "soixante": dPowers(6) = " million"
      dUnits(7) = "sept": dUnits(17) = "dix-sept": dTens(7) = "": dPowers(7) = ""
      dUnits(8) = "huit": dUnits(18) = "dix-huit": dTens(8) = "quatre-vingts": dPowers(8) = ""
      dUnits(9) = "neuf": dUnits(19) = "dix-neuf": dTens(9) = "": dPowers(9) = " milliard"
      dUnits(0) = "zero": dUnits(10) = "ten": dTens(0) = "": dPowers(0) = ""
      dUnits(1) = "one": dUnits(11) = "eleven": dTens(1) = "": dPowers(1) = ""
      dUnits(2) = "two": dUnits(12) = "twelve": dTens(2) = "twenty": dPowers(2) = " hundred"
      dUnits(3) = "three": dUnits(13) = "thirteen": dTens(3) = "thirty": dPowers(3) = " thousand"
      dUnits(4) = "four": dUnits(14) = "fourteen": dTens(4) = "forty": dPowers(4) = ""
      dUnits(5) = "five": dUnits(15) = "fifteen": dTens(5) = "fifty": dPowers(5) = ""
      dUnits(6) = "six": dUnits(16) = "sixteen": dTens(6) = "sixty": dPowers(6) = " million"
      dUnits(7) = "seven": dUnits(17) = "seventeen": dTens(7) = "seventy": dPowers(7) = ""
      dUnits(8) = "eight": dUnits(18) = "eighteen": dTens(8) = "eighty": dPowers(8) = ""
      dUnits(9) = "nine": dUnits(19) = "nineteen": dTens(9) = "ninety": dPowers(9) = " billion"
   dLog10 = LOG(10)

The next piece of the file is a scaffold which you can use to test the Num2Lang function. When there are so many ways that things can go wrong, it's important to automate the testing process so that the same tests are run every time.

The floating point approximation error discussed above demonstrates the need for comprehensive testing. There was no logical error in the code before the "+ .4" was added to it. Nevertheless the function did not return the correct result when the input value was 100, so the cause had to be identified and a workaround created. Comprehensive unit testing will find this sort of error where logic and code writing skills will not.

<<unit tests>>=


mStatus = ""
DO WHILE mStatus = ""
   READ mTest
   IF mTest = "" THEN
      mStatus = "All tests Succeeded"
      READ mLang, mNumber, mExpected
      mGot = Num2Lang$(mNumber, mLang)
      PRINT LEFT$(mLang + ": " + LTRIM$(STR$(mNumber)) + SPACE$(15), 15) + "'" + mGot + "'"
      IF mExpected <> mGot THEN
         mStatus = "Last test failed (Expected '" + mExpected + "')"
      END IF
PRINT mStatus

   DATA "*","en-uk",0,"zero"
   DATA "*","en-uk",1,"one"
   DATA "*","en-uk",9,"nine"
   DATA "*","en-uk",10,"ten"
   DATA "*","en-uk",11,"eleven"
   DATA "*","en-uk",19,"nineteen"
   DATA "*","en-uk",20,"twenty"
   DATA "*","en-uk",21,"twenty-one"
   DATA "*","en-uk",100,"one hundred"
   DATA "*","en-uk",101,"one hundred and one"
   DATA "*","en-us",101,"one hundred one"
   DATA "*","en-uk",1000,"one thousand"
   DATA "*","en-uk",1001,"one thousand and one"
   DATA "*","en-uk",1958,"one thousand nine hundred and fifty-eight"
   DATA "*","fr",10,"dix"
   DATA "*","fr",11,"onze"
   DATA "*","fr",21,"vingt et un"
   DATA "*","fr",22,"vingt-deux"
   DATA "*","fr",29,"vingt-neuf"
   DATA "*","fr",60,"soixante"
   DATA "*","fr",61,"soixante et un"
   DATA "*","fr",62,"soixante-deux"
   DATA "*","fr",71,"soixante et onze"
   DATA "*","fr",79,"soixante-dix-neuf"
   DATA "*","fr",80,"quatre-vingts"
   DATA "*","fr",90,"quatre-vingts-dix"
   DATA "*","fr",99,"quatre-vingts-dix-neuf"
   DATA "*","fr",100,"cent"
   DATA "*","fr",101,"cent un"
   DATA "*","fr",999,"neuf cent quatre-vingts-dix-neuf"
   DATA "*","fr",1000,"mille"
   DATA "*","fr",1100,"mille cent"
   DATA "*","fr",1000000,"un million"
   DATA ""
unit tests
Download code