PSTextMerge merges lists of tabular data into text templates to create fully populated text files. The output files can be Web pages, XML files, RSS Feeds or any other varieties of text files. The input data can be obtained from a tab-delimited data file, or directly from a Microsoft Excel (‘.xls’) spreadsheet, or from a number of other sources. You can record and easily play back recorded scripts, making it easy to re-generate your output text files when your data changes.
The input data can be sorted and filtered before being used to generate output, and these operations can be scripted as well.
Tab-delimited files can be easily exported from most spreadsheets, databases, address book, and many other programs.
PSTextMerge can also perform other operations with tabular data, extracting and merging data from multiple bookmarks files, address books, and other formats.
PSTextMerge is written in Java and can run on any reasonably modern operating system, including Mac OS X, Windows and Linux. PSTextMerge requires a Java Runtime Environment (JRE), also known as a Java Virtual Machine (JVM). The version of this JRE/JVM must be at least 6. Visit www.java.com to download a recent version for most operating systems. Installation happens a bit differently under Mac OS X, but generally will occur fairly automatically when you try to launch a Java app for the first time.
PSTextMerge Copyright © 1999 - 2013 Herb Bowie
As of version 4.10, PSTextMerge is open source software.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
PSTextMerge also incorporates or adapts the following open source software libraries.
BrowserLauncher2 — Copyright 2004 - 2007 Markus Gebhard, Jeff Chapman, used under the terms of the GNU General Public License.
JExcelAPI — Copyright 2002 Andrew Khan, used under the terms of the GNU General Public License.
parboiled — Copyright 2009-2011 Mathias Doenitz, used under the terms of the Apache License, Version 2.0.
pegdown — Copyright 2010-2011 Mathias Doenitz, used under the terms of the Apache License, Version 2.0.
Download the latest version from PowerSurgePub.com. Decompress the downloaded file. Drag the resulting file or folder into the location where you normally store your applications.
PSTextMerge has two arguments which allow the program to be run completely from the command line, without any graphical user interface. The first argument is specified as “-q” and tells the program to run in “quiet mode” — that is, without any GUI. When this option is specified, the log file will be written to disk, as described in the Logging section.
The second argument specifies the location and name of a script file to be played, the location being relative to the PSTextMerge program.
Together, these two options allow PSTextMerge to be executed from a script or batch file, without any user interaction.
Most of PSTextMerge’s user interface elements are laid out in a series of tabs that proceed in a natural progression from left to right. Most of the functions performed within these tabs may also be triggered by their corresponding script commands.
This function allows PSTextMerge to populate a list to be used in later processing. The following controls are available.
Clicking on this button will allow you to select the file or directory to be input. Make sure that all of the following input options are properly set before pressing this button. This function can also be invoked via the File/Open Menu item or with the O shortcut key.
Note that, when a directory (aka folder) is selected, then all eligible files within that directory will be processed, and the resulting list will consist of the concatenation of all the rows of data generated from all of the eligible files.
The default input is a tab-delimited text file, which may have been saved or exported from a spreadsheet, database, address book or other tabular data source. But other options are available as well. Select the one you want from the drop-down list.
Following are descriptions of each of the currently available options.
This is a text file. Each line in the file must use tabs (or commas, if the file extension is “.csv”) to separate each field, or column, in the line. The first line of the file must contain column headings.
This is a text file format that can be used to help plan events for a club, such as an alumni club.
See the Club Planner specification for details.
Setting this option will cause the program to treat the input file as a Microsoft Excel spreadsheet, in Excel 97 - 2004 format (with an ‘.xls’ extension). The first or only worksheet (tab) will be accessed. The first row will be expected to contain column headings, with data in following rows. The first blank row will terminate the list. Each row in the spreadsheet (after the first, containing headings) will be treated as a data record, and each column will be treated as a separate field. Columns containing hyperlinks will also generate fields containing the hyperlinks, and named by appending “link” to the column heading. For example, a column named “ISBN” could have its content accessed with the variable “isbn” and its link accessed with the variable “isbnlink”.
Setting this option will also cause the program to treat the input file as a Microsoft Excel spreadsheet, in Excel 97 - 2004 format (with an ‘.xls’ extension). The first or only worksheet (tab) will be accessed. The first blank row will terminate the list. With this option, each row in the spreadsheet will be returned as a single data field, identified by a variable name of “Table Row”. The data returned will include beginning and ending td tags for each column, with appropriate formatting and cell dimensions and hyperlinks, mimicking the format of the Excel spreadsheet as closely as possible. The data returned does not include beginning or ending tr tags.
When you specify a file directory as your data source, each entry in that directory will then be treated as if it were a single record, or line, or row, from a data file. The “Maximum Directory Depth” field below will control the depth to which sub-directories are read.
See the File Directory specification for details.
Setting this option will cause the program to treat the input file as HTML. It will expect the file to contain bookmarks, using nested lists to indicate a folder hierarchy. A table will be created with the following columns. Many bookmark managers, including popular browsers, can save or export bookmarks into this format.
Setting this option will cause the program to treat the input file as HTML. It will expect the file to contain bookmarks, using varying levels of heading tags to indicate a folder hierarchy. A table will be created with the same columns as the HTML Bookmarks using Lists, described above. The contents of the varying heading levels will be placed in Category 1 through 6.
Setting this option will cause the program to treat the input file as HTML. It will expect the file to contain a table, using table, tr, th and/or td tags. The first row in the table will be expected to contain column headings, with following rows containing the corresponding data.
Note that many HTML pages are laid out using tables, not to present columns and rows of data, but to format the page in a desired fashion. Tables containing column headings and following data may often be embedded in such layout tables. In order to try to separate the two, PSTextMerge in this mode will look for the first table cell (within th or td tags) containing 1 to 40 characters of text, with the idea that text beyond either extreme is not likely to be a true column heading. During this search for the first cell of a data table, table cells with colspan or rowspan parameters greater than 1 will also be ignored.
In some cases, however, you may need to edit the prospective input file with your favorite text editor, and delete lines preceding and following the table containing the data you are interested in, saving the resulting file as a separate file to be input to PSTextMerge. If a true first column heading is longer than 40 characters, then you will also need to reduce its length to the acceptable 1-40 range.
Note that for any HTML input option, character entities found within HTML text will be translated to their equivalent ASCII characters. For now, translation is only provided for characters that are not platform-specific: “ ” (non-breaking space), “<” (less than sign), “>” (greater than sign), “&” (ampersand) and “"” (double quotation marks). Entities may be specified using mnemonics or their numeric equivalents.
Setting this option will cause the program to treat the input file as HTML. It will extract all of the links within this file, identifying the from page, the link type, and to file, for each link found. Link type may be linkhref, imgsrc, or ahref, for the corresponding tag and attribute combinations.
Selecting this option will cause the input file to be treated as an iTunes Library.xml file.
This file can generally be found in your Music / My Music folder, and then within your iTunes folder. See Apple Support articles Where are my iTunes files located? and What are the iTunes library files? for details.
This information can be used to publish your list of albums to a Web site, or to import your list of albums into a database, such as Bento.
This input routine will extract and summarize information about albums contained in your iTunes library. For each unique album title, the following fields will be extracted.
If you select a folder containing Markdown files, then this input option will extract metadata about each file and make them available in a list.
See the Markdown Metadata specification for details.
Setting this option will cause the program to treat the input file as XML. Each field will be returned as a separate record. Field names will be stored in columns 1 - 4, with column names “Name1” through “Name4”, and field values will be stored in column 5, with column name “Data”.
Setting this option will cause the program to treat the input file as XML. Each set of fields at the same level will be returned as a separate record, with the assumption that the XML file consists of a series of records with the same fields. Field names will be stored in columns 1 - 4, with column names “Name1” through “Name4”, and field values will be stored in column 5, with column name “Data”.
Clicking on this check box causes the program to look for a special data dictionary file to accompany the tab-delimited data source. The data dictionary file must have the same name as the primary file, but with a file extension of “.dic”.
The data dictionary file itself is in tab-delimited format. Each row in the file (after the column headings) represents one column in the primary data file.
Note that the easiest way to create a data dictionary for a file is to input the file to PSTextMerge (without a dictionary), then Output it with a dictionary (see the Output section for details). You can then edit the resulting file, only modifying the values you wish to change.
A dictionary may have the following columns describing the fields in the primary file.
This is the column heading that identifies this field in the primary data file. It would normally include mixed-case, spaces and punctuation to make it readable.
This is really a lowest-common denominator form of the column heading, with capitals, spaces and punctuation all removed. This is used internally by PSTextMerge to allow slight variations in punctuation, etc., without recognizing them as two separate names.
If the names previously given should be treated as an alias for another field name, then this column should contain the primary name for the field, and the remaining columns for the alias should be left blank. This feature doesn’t do much in this release of PSTextMerge, but should become more meaningful in later releases.
The default value here is “DataFormatRule”, which causes no special formatting of the input data. But specifying another value can cause the input data to be formatted and even converted according to any of the following special rules.
If a valid function name is specified here, then the field value will be calculated using the specified function. The following functions are available. The functions make use of the parameters specified in the following columns.
Parm | Used As |
---|---|
Parm1 | Name of the file containing the lookup table. |
Parm2 | Name of the field in the lookup table that should be used as the table’s key. |
Parm3 | A value of “Yes” or “True” indicates that the key comparison should be case-sensitive. |
Parm4 | Name of the field in this file that should be used as the lookup key. |
Parm5 | Name of the field in the lookup file that should be returned and used as the value for this field. |
Used as input to the specified function.
Used as input to the specified function.
Used as input to the specified function.
Used as input to the specified function.
Used as input to the specified function.
Select one of the following radio buttons to indicate how and whether you want new data to be merged with existing data.
This is the default. The next data source to be input will overlay any data previously input.
The next input data source to be opened will be merged with the current data visible on the View Tab. Existing sort keys and filters may need to be reapplied. The new data and the old data should have column names that are at least partially overlapping, if not identical.
The next input data source to be opened will be merged with the current data visible on the View Tab. The program will not look for column headings in the new input file, but will instead assume that the column names for the new file are the same, and in the same order, as those currently visible on the View Tab. Existing sort keys and filters may need to be reapplied.
Note that this merge option is useful for programs (such as AppleWorks) that do not include column headings in their export files. If a separate file is created containing only column headings, then it can be input first, followed by the headerless file, with this merge option.
If you are about to read a file directory, then this field controls whether sub-directories are read, and to what depth. A value of 1 is the default, and indicates that only files and directories in the specified directory will be listed, with no sub-directory contents. A value of 2 indicates one level of sub-directories, and so forth. Use the Increment and Decrement buttons to change the depth.
This tab allows the user to view a tab-delimited data file that has been opened for input. Subsequent tabs, such as Sort and Filter, will affect the data that is displayed on the View tab.
The user can scroll from left to right and up and down, assuming there is more data available than will fit within the current window. Columns can also be resized by clicking on their right borders and dragging. Each initial column size will be approximately proportional to the largest data field within the column.
This tab allows the user to sort the data that has been input. Sorting is accomplished by using the following buttons that appear on this tab.
This is a drop down list of all the columns in your data. Select the next field name on which you wish to sort, by starting with the most significant fields and proceeding to less significance.
This is a drop down list. Pick either ascending (lower values towards the top, higher values towards the bottom) or descending. This sequence applies to the currently selected field name (see above).
Pressing this button will add the field and sequence currently specified to the current sort parameters being built. The sort parameters added will appear in the text area shown below on this tab. After pressing the Add button, the user may go back and specify additional fields to be used in the sort criteria.
Pressing this button will clear the sort parameters being built, so that you can start over.
Once your desired sort parameters have been completely built, by pressing the Add button one or more times, you must press the Set button to cause your parameters to be applied to the data you are currently processing.
After setting the desired sort sequence, you may optionally press this button to combine records with duplicate sort keys. The following buttons allow you to adjust the parameters controlling the combination process.
Record combination can be done with varying degrees of tolerance for data loss. Select one of the following radio buttons.
If you specify some data loss to be acceptable, then this field may be used to specify a minimum number of data (non-key) fields that must be lossless (equal or one blank) before combination will be allowed to take place. This should be used if the sort keys are not guaranteed to establish uniqueness. Specifying a non-zero value here may help to prevent completely disparate records from being inadvertently combined. For example, names can be used to identify people, but two different people may have the same name.
This tab allows the user to filter the data that has been input, selecting some rows to appear and others to be suppressed. Filtering is accomplished by using the following buttons that appear on this tab.
This is a drop down list of all the columns in your data. Select the next field name on which you wish to filter.
This is a drop down list. Pick the operator that you want to use to compare your selected field to the following value. The following operators are available.
This is the value to which the selected field will be compared. Only rows that satisfy this comparison will be visible after the filtering operation. You may type in a desired value, or select from the drop down list. The drop down list will consist of all the values found in this field within your data.
Pressing this button will add the field, operator and value currently specified to the current filter parameters being built. The filter parameters added will appear in the text area shown below on this tab. After pressing the Add button, the user may go back and specify additional fields to be used in the filtering criteria.
Pressing this button will clear the filter parameters being built, so that you can start over.
Once your desired filter parameters have been completely built, by pressing the Add button one or more times, you must press the Set button to cause your parameters to be applied to the data you are currently processing.
If you specify more than one filter parameter, then you may specify whether all of them must be true (and) or only any one of them must be true (or) in order to satisfy the filtering criteria. This choice applies to the entire set of criteria, so this need only be selected once before pressing the Set button.
This tab allows the user to save the current data to an output disk file. The data will be saved in the sequence set by the current sort parameters, if any. If a current filter is in effect, then only visible, filtered data will be saved. If any data transformation or formatting occurred on input, then the data will be saved in its new form. The following controls appear on this tab.
Clicking on this button allows you to select a location and name for the output file, and then writes the file (with optional dictionary) as specified. This function can also be invoked via the File/Save Menu item or with the S shortcut key.
Clicking on this check box causes the program to write a special data dictionary file to accompany the tab-delimited data source being output. The data dictionary file will have the same name as the primary file, but with a file extension of “.dic”. Clicking this check box again will cause it to revert to its original state, in which the dictionary will not be saved. See the Input section for a complete description of the dictionary file.
This tab allows the user to merge the currently loaded data into a template file, producing one or more output text files. The greatest anticipated use for this function is to create Web pages, based on input template files containing a mixture of HTML tags and special PSTextMerge tags. This allows tab-delimited data to be periodically merged into an HTML template that determines the format in which the data will be displayed on a Web site.
This screen contains the following buttons.
PSTextMerge supports the concept of a central template library where you can store reusable templates. The initial location for this folder is the “templates” folder within the PSTextMerge Folder that comes as part of the software distribution. However, this button can be used to allow you to select another folder as your template library. After installation of PSTextMerge, you may wish to copy the templates folder to another location, perhaps within your home folder, or your documents folder, and then use this button to specify that new location.
This button allows you to specify the location and name of the template file you wish to use. (This file must have previously been created using a text editor.) This function may also be invoked via the Template/Open Menu item or with the T shortcut key.
This button also opens a template file, but uses your template library as the starting location.
This button processes the template file you have selected, and creates whatever output file(s) you have specified in the template file. The function may also be invoked via the Template/Generate Menu item or with the G shortcut key.
See the Template File Format specification for details.
This tab allows the user to record and playback sequences of PSTextMerge commands. The following buttons and menu commands are available.
Clicking on this button once causes the program to begin recording your subsequent actions as part of a script that can be edited and played back later. This function may also be invoked via the Script/Record Menu item, or with the R shortcut key. You will need to specify the location and name for your script file. It is recommended that “.tcz” be used as a file extension for PSTextMerge script files (the original name for the program was “TDF Czar”). This will be supplied as a default if no extension is specified by the user.
Clicking on this button causes recording of the current script to stop. This function may also be invoked via the Script/End Recording Menu item, or with the E shortcut key. The script file will be closed, and can now be opened for editing, if desired, using the text editor, spreadsheet or database program of your choice.
This button allows you to select a script file to be played back. This function may also be invoked via the Script/Play Menu item, or with the P shortcut key. At the end of a script’s execution, the input file options will be reset to their initial default values, to ensure consistent execution when multiple scripts are executed consecutively.
This button allows you to replay the last script file either played or recorded. Using this button allows you to bypass the file selection dialog. It can be handy if you are developing, modifying or debugging a series of actions and associated files. This function may also be invoked via the Script/Play Again Menu item, or with the A shortcut key.
Clicking this button will allow you to select a script to be automatically played every time the application is launched.
After selecting a script to play automatically, the label of this button will change to “Turn Autoplay Off”.
Clicking this button will allow you to select a folder of scripts that you want easy access to. A new tab will then be added to the interface, labeled “Easy”. The new tab will contain a button for every script found in the folder. Clicking on a button will then play the corresponding script.
After selecting an Easy Play folder, the label of this button will change to “Turn Easy Play Off”.
This menu item, within the Script menu, allows you to select a recently played script to run. The most recent 10 scripts will be available to select from.
The script file is itself a tab-delimited text file, and you can edit one using your favorite tool for such things. You can create one completely from scratch if you want, but it usually easiest to record one first, and then edit the results.
The script file has the following columns.
Following is a complete list of all the allowable forms for script commands. Constants are displayed in normal type. Variables appear in italics. Blank cells indicate fields that are not applicable to a particular command, and therefore can be left blank or empty. Forward slashes are used to separate alternate values: only one of them must appear (without the slash) in an actual script command. Most of the values correspond directly to equivalent buttons on the tabs, as described elsewhere in this user guide. The one non-intuitive value is probably the Filter values for the andor object: True sets “and” logic on, while False sets “or” logic on.
Note that file names may begin with the literal “PATH” surrounded by “#” symbols. When recording a script, the program will automatically replace the path containing the script file with this literal. In addition, upwards references from the location of the script file will be indicated by two consecutive periods for each level in the folder hierarchy. On playback, the reversing decoding will occur. In effect this means that files within the same path structure as the script file, or a sub-folder, will have their locations identified relative to the location of the script file. Files on a completely different path will have their locations identified with absolute drive and path information. The overall effect of this is to make a script file, along with the input files referenced by the script file, portable packages that can be moved from one location to another, or executed with different drive identifiers, and still execute correctly. Normally all of this will be transparent to the user.
Similarly, the literal “#TEMPLATES#” will be used as a placeholder for the path to the current template library, as set with the Set Template Library button on the Template tab.
The “epubin” and “epubout” actions require some additional description, since they have no correlates on the Script tab just described. The former identifies a directory containing the contents of an e-book in the EPUB format; the latter identifies the “.epub” file to be created using that directory as input.
module | action | modifier | object | value |
---|---|---|---|---|
input | open | url | merge/blank | url name |
input | open | file | merge/blank | file name |
input | open | dir | merge/blank | directory name |
input | open | html1 | merge/blank | file name |
input | open | html2 | merge/blank | file name |
input | open | html3 | merge/blank | file name |
input | open | xml | merge/blank | file name |
input | open | xls | merge/blank | file name |
input | epubin | dir | blank | directory name |
input | epubout | file | blank | file name |
sort | add | Ascending/ Descending | field name | |
sort | clear | |||
sort | set | params | ||
combine | add | dataloss | integer | |
0 = no data loss, 1 = one record overrides, 2 = allow concatenation | ||||
combine | add | precedence | integer | |
+1 = later overrides earlier, -1 = earlier overrides later | ||||
combine | add | minnoloss | integer | |
combine | set | params | ||
filter | set | andor | True/ False | |
filter | add | operator | field name | comparison value |
filter | clear | |||
filter | set | params | ||
output | set | usedict | True/ False | |
output | open | file name | ||
template | open | file | file name | |
template | generate |
This tab allows the user to control logging operations. PSTextMerge writes information about certain events to a log file. Reviewing this data can be useful, especially if the program is not performing as expected. The following sections appear on this tab.
This determines where the log output is sent. You can select any of the following options.
This determines the quantity and severity of messages that will appear in the log. You have the following options.
Input data files are often passed to the logger, primarily so that significant events that are data-related can include a display of the data record that generated the event. Checking the “Log All Data?” box will result in all data passed to the logger being written to the log. This may be helpful if the log is otherwise showing insufficient data to let you understand the workings of the program.
This tab allows the user to view some basic information about the PSTextMerge program. It includes a version number and copyright information. More complete information is contained in this user guide.
This tab can also be reached by selecting the About PSTextMerge Menu Item. On the Mac, this will appear in the Application Menu. On other operating systems, it will appear in the Help Menu.
The Help Menu includes three other useful items. The first, User Guide, attempts to launch the PSTextMerge User Guide installed with your program as a window within a local Web Browser. This operation will not always be successful, due to operating system restrictions. In some cases, as an alternative, your Web Browser will be directed to the PSTextMerge Home Page on the Web. It is possible that the information on the Web may not apply to your version of PSTextMerge. You can always navigate to your local user guide by using your file system navigator to select the file “userguide/index.html” within your PSTextMerge application folder, and then launching it using your preferred Web Browser.
The next item on the Help Menu will attempt to direct your Web Browser to the PSTextMerge Home Page. You will need an active Internet connection for this to be successful. Again, operating system restrictions may prevent this action from being completed successfully. If so, you can always copy and paste the following address into your Web Browser: http://www.powersurgepub.com.