Finally, RPG can parse JSON as easily as it can XML! This month, I’ll take a look at RPG’s brand-new DATA-INTO opcode and how you can use it today in your RPG programs. I’ve combined DATA-INTO with the free YAJL tool to let you parse JSON documents with DATA-INTO in a manner very similar to the way you’ve been using XML-INTO for XML documents.

I’ll also take a moment to ask you to vote in COMMON’s Board of Directors election. I’m running for the Board this year, alongside some other great leaders in our community. Please spread the word and vote!

Board of Directors Election

I need your vote! I am running for COMMON’s Board of Directors, and this is an elected position voted upon by the COMMON membership. Why am I running? I’ve had a love affair with the IBM i community for a long time, and since 2001, I have tried very hard to bring value through articles, free software, online help and giving talks on IBM i subjects. To me, COMMON is one of the main organizations that represents the community, and I feel that this is one more way that I can provide value and give back.

Voting is open from April 22nd to May 22nd. You can learn more about my position as well as the other candidates by clicking here.

To place your vote, you’ll need to log into Cosmo and click the “Board of Directors Election” at the top of the page.

Please vote and help me spread the word! It means a lot!

RPG’s DATA-INTO Opcode: The Concept

Years ago, IBM added the XML-INTO opcode to RPG, which greatly simplified reading XML documents by automatically loading them into a matching RPG variable. Unfortunately, it only works with XML documents, and today, JSON has become even more popular than XML. This led me and many other RPGers to ask IBM for a corresponding JSON-INTO opcode. Instead, IBM had a better idea: DATA-INTO. DATA-INTO is an opcode that can potentially read any data format (with a little help) and place it into an RPG variable. That way, if the industry decides to change to another format in the future, RPG already has it covered.

I expect that you’re thinking what I was thinking: “There are millions of data formats! No single opcode could possibly read them all!” DATA-INTO solves that problem by working together with a third-party routine called a “parser”. The parser is what reads and understands data formats such as JSON or whatever might replace it in the future. It then passes that data back to DATA-INTO as parameters, and DATA-INTO maps the data into RPG variables. Think of the parser is a plug-in for DATA-INTO. As long as you have the right plug-in (or “parser”), you can read a data format. In this article, I’ll present a parser for JSON, so you’ll be able to read JSON documents. Other people have already published parsers for property file format and CSV documents. When “the next big thing” happens, you’ll only need a parser for it, and then DATA-INTO will be able to handle it.

One more way to think of it: it is similar to Open Access. Open Access provides a way to use RPG’s native file support to read any sort of data. Where traditional file access called IBM database routines under the covers, Open Access lets you provide your own routine, so you can read any sort of data and aren’t limited to only IBM’s database. DATA-INTO works the same way, except instead of replacing the logic for files, it replaces the logic for XML-INTO with a general-purpose system.

DATA-INTO Requirements

DATA-INTO is part of the ILE RPG compiler and runtime. No additional products need to be installed besides the aforementioned parser. You will need IBM i 7.2 or newer, and you’ll need the latest PTFs installed.

The PTFs required for 7.2 and 7.3 are found here.

When IBM releases future versions of the operating system (newer than 7.3), the RPG compiler will automatically include DATA-INTO support without additional PTFs.

DATA-INTO will not work with the 7.1 or earlier RPG compilers and will not work with TGTRLS(V7R1M0) or earlier specified, even if you are running it on 7.2.

My JSON parser for DATA-INTO requires the open source YAJL software. You will need a copy from April 17, 2018 (or newer) to have DATA-INTO support. It is available at no charge from my website.

DATA-INTO Syntax

DATA-INTO works almost the same as the XML-INTO opcode that you may already be familiar with. The syntax is as follows:

DATA-INTO your-result-variable %DATA(document : rpg-options) %PARSER(parser : parser-options);

-or-

DATA-INTO %HANDLER(handler : commArea) %DATA(document : rpg-options) %PARSER(parser : parser-options);

If you’re familiar with XML-INTO, you’ll notice that the syntax is identical except for the %PARSER built-in function and its parameters. Here’s a description of what each part means:

your-result-variable = An RPG variable to receive the data. Most often, this will be a data structure that is formatted the same way as the document you are reading. Fields in the document will be mapped to corresponding fields in your variable, based on the variable names. I’ll explain this in more detail below.

%DATA(document : rpg-options) = Specifies the document to be read and the options that are understood and used by RPG (as opposed to the parser) when mapping fields into your variable. The document parameter is either a character string containing the document itself or is an IFS path to where the document can be read from disk. The rpg-options parameter is a character string containing configuration options of how the data should be mapped into variables. (It is the same as the second parameter to the %XML built-in function used with XML-INTO and works the same way.) The following is a summary of those options:

  • path option = specifies a location within the JSON document to begin parsing, and lets you parse only a subset of the document if desired
  • doc option = specify doc=string if the document parameter contains the JSON document itself, or doc=file if it contains an IFS path name
  • ccsid option = controls whether RPG does it’s processing in Unicode or EBCDIC
  • case option = controls how strictly a variable name must match the document field names, including whether it’s case sensitive or whether special characters get converted to underscores
  • trim option = controls whether blanks and other whitespace characters are trimmed from character strings
  • allow missing option = controls whether it is okay for the document to be missing fields that are in the RPG variable
  • allow extra option = controls whether it is okay for the document to have extra fields that are not in the RPG variable
  • count prefix option = a prefix to be added for RPG fields that should contain a count of the number of matching elements (vs the data of the element)
    • For example, the number of entries loaded into an array
    • Can also be used to determine whether an optional element was/wasn’t provided

I don’t want this post to get too bogged down with the details of each option, so if you’d like to read more details, please see the ILE RPG reference manual.

%PARSER(parser : parser-options) = Specifies a program or subprocedure to act as a parser (or “plugin” as I like to think of it) that will interpret the document. My parser will be a JSON parser, and it will know how to interpret a JSON document and pass its data to RPG. The parser-options parameter is a literal or variable string that’s intended to be used to configure the parser. The format of parser-options, and which options are available, is determined by the code in the parser routine.

%HANDLER(handler : commArea) = Specifies a handler routine to be used as an alternative to a variable. You use this when your data should be processed in “chunks” rather than reading the entire document at once. I consider this to be a more advanced usage of DATA-INTO (or XML-INTO) that is primarily used when it’s not possible to fit all the needed data into a variable. This was very common back in V5R4 when variables were limited to 64 KB but is not so common today. For that reason, I will not cover it in this article. If you’re interested in learning more, you can read about it in the ILE RPG Reference manual, or e-mail me to suggest this as a topic for a future blog post.

The YAJLINTO Parser

The current version of YAJL ships with a program named YAJLINTO, which is a parser that DATA-INTO can use to read JSON documents. You never need to call YAJLINTO directly. Instead you use it with the %PARSER function. Let’s look at an example!

In a production application, you’d get a JSON document from parameter, API or file. To keep this example simple, I’ve hard-coded in my RPG program by assigning it to a character string as follows:

dcl-s json varchar(5000);

json = ‘{ +
     “success”: true, +
     “errorMsg”: “No error reported”, +
     “customers”:[{ +
       “name”: “ACME, Inc.”, +
       “address”: { +
        “street”: “123 Main Street”, +
        “city”: “Anytown”, +
        “state”: “WI”, +
        “postal”: “53129” +
       } +
     }, +
     { +
       “name”: “Industrial Supply Limited.”, +
       “address”: { +
        “street”: “1000 Green Lawn Drive”, +
        “city”: “Fancyville”, +
        “state”: “CA”, +
        “postal”: “95811” +
       } +
     }] +
    }’;

In last month’s blog entry, I explained quite a bit about JSON and how to process it with YAJL. I won’t repeat all of the details about how it works, but just to refresh your memory, the curly braces (the { and } characters) start and end a JSON data structure. (They call them “objects”, but it is the same thing as a data structure in RPG.) The square brackets (the [ and ] characters) start and end an array.

Like XML-INTO, DATA-INTO requires that a variable is defined that exactly matches the layout of the JSON document. When RPGers write me complaining of problems with this sort of programming, the problem is almost always that their variable doesn’t quite match the document. So please take care to make them match exactly. In this case, the RPG definition should look like this:

dcl-ds myData qualified;
success ind;
errorMsg varchar(500);
num_customers int(10);
dcl-ds customers dim(50);
name varchar(30);
dcl-ds address;
street varchar(30);
city  varchar(20);
state char(2);
postal varchar(10);
end-ds;
end-ds;
end-ds;

Take a moment to compare the RPG data structure to the JSON one. You’ll see that the JSON document starts and ends with curly braces and is therefore a data structure – as is the RPG. Since the JSON structure has no name, it does not matter what I name my RPG structure. So, I called it “myData”. The JSON document contains three fields named success, errorMsg and customers. The RPG code must also use these names so that they match. The customers field in JSON is an array of data structures, as is the RPG version. The address field inside that array of data structures is also a data structure, and therefore the RPG version must also be.

The one field that is different is “num_customers”. To understand that, let’s take a look at how I’m using these definitions with the DATA-INTO opcode.

data-into myData %DATA(json: ‘case=any countprefix=num_’)
         %PARSER(‘YAJLINTO’);

The first parameter to DATA-INTO is the RPG variable to read the data into, in this case it is the myData data structure shown above.

The JSON data is specified using the %DATA built-in function, which receives the ‘json’ variable – the character string containing the JSON data. The second parameter to %DATA is the options for RPG to use when mapping the fields. I did not need to specify “doc=string” to get the JSON data from a variable because “doc=string” happens to be the default. I did specify two other options, however.

case=any – means that the upper/lowercase of the RPG fields do not have to match that of the JSON fields.

countprefix=num_ – means that if I code a variable starting with the prefix “num_” in my data structure, it should not come from the data, but instead, RPG should populate it with the count of elements. In this case, since I have defined “num_customers” in my data structure, RPG will count the number of elements in the “customers” array and place that count into the “num_customers” field.

That explains why the extra num_customers field is in the RPG data structure. Since customers is an array and I don’t know how many I’ll be sent, RPG can count it for me, and I can use this field to see how many I received. That’s very helpful!

Countprefix is also helpful in cases where a field may sometimes be in the JSON document and sometimes may be omitted. In that case, the count prefixed field will bypass the “allow_missing” check and allow the field to not exist without error. If the field didn’t exist in the JSON, the count will be set to zero, and my RPG code can detect it and handle it appropriately.

The %PARSER function tells DATA-INTO which parser program to call. In this case, it is YAJLINTO. The %PARSER function is capable of specifying either a program or a subprocedure and can also include a library if needed.

When specifying a program as the parser, it can be in any of the following formats:

‘MYPGM’
‘*LIBL/MYPGM’
‘MYLIB/MYPGM’

Notice that the program and library names are in capital letters. Unless you are using case-sensitive object names on your IBM i (which is very unusual), you should always put the program name in capital letters.

To specify a subprocedure in a service program, use one of the following formats:

‘MYSRVPGM(mySubprocedure)’
‘MYLIB/MYSRVPGM(mySubprocedure)’
‘*LIBL/MYSRVPGM(mySubprocedure)’

The subprocedure name must be in parenthesis to denote that it is a subprocedure rather than a program name. Like the program names above, the service program name should be in all capital letters. However, subprocedure names in ILE are case-sensitive, so you will need to be sure to match the case exactly as it is exported from the service program. Use the DSPSRVPGM command to see the how the procedures are exported.

After the DATA-INTO opcode runs successfully, the myData variable will be filled in, and I can use its data in my program just as I would use any other RPG variable. For example, if I wanted to loop through the customers and display their names and cities, I could do this:

dcl-sint(10);

for x = 1 to myData.num_customers;
  dsply myData.customers(x).name;
  dsply myData.customers(x).address.city;
endfor;

Naturally, you wouldn’t want to use the DSPLY opcode in a production program, but it’s a really easy way to try it and see that you can read the myData data structure and its subfields. Now that you have data in a normal RPG variable, you can proceed to use it in your business logic. Write it to a file if you wish, or print it, place it on a screen, whatever makes sense for your application.

Parser Options for YAJLINTO

Earlier I mentioned that %PARSER has a second parameter for “parser options.” This parameter is optional, and I didn’t use it in the above example. However, there are some options available with YAJLINTO that you might find helpful.

Unlike the standard DATA-INTO (or XML-INTO) options, YAJLINTO expects its options to be JSON formatted. I designed it this way because YAJL already understands JSON format, so it was very easy to code. But, it is also powerful. I can add new features in the future (without interfering with the existing ones) simply by adding new fields to the JSON document.

As I write this, there are three options. All of them are optional, and if not specified, the default value is used.

value_true = the value that will be placed in an RPG field when the JSON document specifies a Boolean value that is true. By default, this puts “1” in the field because it’s assumed that Booleans will be mapped to RPG indicators. You can set this option to any alternate value you’d like to map, up to 100 characters long.

value_false = value placed in an RPG field when JSON document specifies a Boolean value that is false. The default value is “0”.

value_null = value placed in an RPG variable when a JSON document specifies that a field is null. Unfortunately, DATA-INTO does not have the ability to set an RPG field’s null indicator, so a special value must be placed in the field instead. The default is “*NULL”.

For example, consider the following JSON document:

json = ‘{ “inspected”: true, “problems_found”: false, +
     “date_shipped”: null }’;

In this example, I prefer “yes” and “no” for the Boolean fields. It’ll say, “yes it was inspected” or “no problems were found”. The date shipped is a date-type field and therefore cannot be set to the default null value of *NULL. So, I want to map the special value of 0001-01-01 to my date. I can do that as follows:

dcl-ds status qualified;
inspected varchar(3);
problems_found varchar(3);
date_shipped date;
end-ds;

data-into status %data(json:‘case=any’)
%parser(‘YAJLINTO’ : ‘{ +
“value_true”: “yes”,+
“value_false”: “no”,+
“value_null”: “0001-01-01” +
}’);

When this code completes, the inspected field in the status data structure will be set to “yes”, and the problems_found field set to “no”. The date_shipped will be set to Jan 1, 0001 (0001-01-01.)

Debugging and Diagnostics

and the parser during their processing. To enable this, you’ll need to add an environment variable to the same job as the program that uses data-into. For example, you could type this:

ADDENVVAR ENVVAR(QIBM_RPG_DATA_INTO_TRACE_PARSER) VALUE(‘*STDOUT’)

In an interactive job, this will cause the trace information to scroll up the screen as data-into runs. In a batch job, it would be sent to the spool. Information will be printed about what character sets were used and which fields and values were found in the document.

For example, in the case of the “status” data structure example in the previous section, the trace output would look like this:

—————- Start —————
Data length 136 bytes
Data CCSID 13488
Converting data to UTF-8
Allocating YAJL stream parser
Parsing JSON data (yajl_parse)
No name provided for struct, assuming name of RPG variable
StartStruct
ReportName: ‘inspected’
ReportValue: ‘yes’
ReportName: ‘problems_found’
ReportValue: ‘no’
ReportName: ‘date_shipped’
ReportValue: ‘0001-01-01’
EndStruct
YAJL parser status 0 after 68 bytes
YAJL parser final status: 0
—————- Finish ————–

Writing Your Own Parser

When DATA-INTO is run, it loads the document into memory and then calls the parser. It passes a parameter that contains the document to read as well as information about all of the options the user specified. It is then responsible for interpreting the document and calling some subprocedures to tell DATA-INTO what was found.

Writing a parser is best done by someone who is good at systems-type coding. You will need to work with pointers, procedure pointers, CCSID conversions and system APIs. I suspect most RPGers will want to find (or maybe purchase) a third-party parser rather than write their own. For that reason, I will not teach you how to write a parser in this blog entry.

However, if you’d like to learn more about this in a future installment of Scott’s iLand, please leave a comment below. If enough people are interested, I’ll write one.

In the meantime, I suggest the following options:

The Rational Open Access: RPG Edition manual has a chapter on how to write a parser and provides an example of a properties document parser.

Jon Paris and Susan Gantner recently published some articles that explain how to use DATA-INTO as well as write a parser. They provide an example of reading a CSV file.

Attending POWERUp18? Be sure to check out Scott’s sessions.

Interested in learning more about DATA-INTO? Attend this session from Barbara Morris.

See you in San Antonio!

One thought on “Parsing JSON with DATA-INTO! (And, Vote for Me!)

  1. I just started a project that needs to call an API passing JSON & getting back a JSON response from our credit card processor.
    I would love to see a sample program that does that using the new data-into to get back the response.

    Look forward to your sessions at PowerUp18!!

Leave a Reply

Your email address will not be published. Required fields are marked *