FIDAL - Financial Data Access Library |
| ASCII Data Source2.0 Pre-defined ASCII File Format 3.0 User-defined ASCII File Format 4.0 FD_AddDataSource Parameters Details
1.0 IntroductionASCII Files are probably the simplest way to store stock market data. These file can be easily generated, converted or maintain by using off-the-shelf software. Most of the commercial data provider includes a conversion tool allowing to translate to at least the ASCII format. It is very easy to allow the FIDAL use your ASCII files. You need to indicate in which directory the files can be found. You can use wildcards to include multiple files in one call. You will need also to specify the format in which your data is stored. For most of the user, you can simply use one of the predefined file format. See 2.0 Pre-defined ASCII File Format. For more advanced user, the field definition allows to specify custom file format. It allows also advanced capability like extracting fields from the filename or even the directory path! See 3.0 User-defined ASCII File Format. A very small ASCII database is provided with the software package for experimentation.
2.0 Pre-defined ASCII File FormatTo use a pre-defined format, you must first know in which order the data is stored in your ASCII files. You can then add the files, one by one, or many at the same time with wildcards. Adding a file or a directory to the unified database is done with FD_AddDataSource. Here is an example adding all files from the "my_data" directory into the "US.NASDAQ.STOCK" category: FD_AddDataSourceParam param; memset( ¶m, 0, sizeof( FD_AddDataSourceParam ) ); FD_AddDataSource( unifiedDatabase, ¶m );
(Note: As you probably know, in ANSI C, the '\' shall be '\\' in a string to avoid confusion with special character like '\n'...) By default, the name of the file will become the "symbol" name in the database. You can refine or extract the symbol name in a different way by using the field capability (see next section). If you have the choice , I strongly suggest to keep it simple: just use the file name as the symbol name. The order in which the data is specified is indicated by the "param.info" (The FD_DOHLCV in the above example). See "fidal.h" for the list of pre-defined type. Examples are: FD_DOHLCV, FD_DOCHLV, FD_DCV ... The comma represent a separator. The separator can be any character except a digit or a decimal point '.' Ok.. let me give some concrete examples. All the following format are going to be correctly parsed: Example 1: this is the CSV format from Microsoft Excel. DATE : Open | High | Low | Close |Volume| O. Int.| You see that all these files have simply in common the order in which the data is provided. Some basic rules for accepting a variety of ASCII files:
Using that example you can easily figure out all the other pre-defined format. If that symbol format is not convenient (let's say because the date are not in the same format), consider the section 3.0 User-defined ASCII File Format
3.0 User-defined ASCII File Format3.1 Describing File ContentIn the case that the pre-defined file format are not applicable, a user will need to build its field string. That string describe how each line of the file is going to be interpreted. If you look at "fidal.h" you will see that the pre-defined format are simply strings specifying the order of the fields (reminder: this field string is the "param.info" of the FD_AddDataSource function). Example: FD_DOHLCV is the pre-defined string "[Y][M][D][O][H][L][C][V]". If someone needs a string with a different date format, let's say Month/Day/Year, he will use the string "[M][D][Y][O][H][L][C][V]" as the "param.info" parameter when calling FD_AddDataSource. "[HR][MIN][SEC][O][H][L][C][V]" a variant for intra-day price bar. "[D][M][Y][C]" for daily data with only a close price (like a mutual fund). "[M][Y][C][V][OI]" Monthly commodity/future data with open interest field. Here is the complete list of available fields that can be used to describe the content of the file:
Finally, here is an example of the most complex format I can think of: "[-H=12][-C=10][-I][-R=2][YYYY][M][DD][-I=1][V][O][C][H][L][OI][HR][MN=5]" In that example, the first 12 lines in the file are skip. For each line, the first 10 characters are always ignored. The following integer and 2 real values are then ignored as well. Then the date fields are extracted, following this an integer number is ignored, and finally all the remaining fields are read. [MN=5] force the periodicity of the price bar to be on 5 minutes boundary. 3.2 Defining PeriodicityThe periodicity is the amount of time between each price bar (daily, monthly, 10 minutes etc...) In most case, FIDAL will select by default the most logical periodicity depending of the date/time field specified. The rules are the following (in order): 1) If one of the time field is specified, it is assumed to be intra-day data. By default, the periodicity is determined by the amount of time between the first two price bar in the file. It is also possible to force the periodicity (see below). For intra-day, you can force the periodicity by adding information to ONLY ONE of the time field: [HOUR=n] or [HH=n] where 'n' are hour increment.
If more than one field specify a time increment, an error is returned. This is sufficient for supporting all practical intra-day periodicity. Only natural boundary of times are valid:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Fields for 'param.location' | |
|---|---|
| CAT | Extract a string representing the "category". |
| SYM | Extract a string representing the "symbol" |
These fields can extract a sub-string at any place in the path. These fields are like "Wildcards" that are replace by the extracted value for each ASCII file. Both fields can be extract simultaneously.
Note 1: When the [CAT] field is NOT specified, the 'param.category' parameter of the FD_AddDataSource will be used by default. If that parameter is NULL, the default "ZZ.OTHER.OTHER" category will be used.
Note 2: When the [SYM] field is NOT specified, the first portion of the filename (before the first '.') will be used for each applicable files.
| Examples of 'param.location' | |
|---|---|
| "c:\db\[CAT]\[SYM].csv" | This is probably the most common usage of these fields. The category will be represented by all directory immediately under "db". All ".csv" files in each directory are going to represent individually a symbol. That symbol will be added in the unified database by using the corresponding category. In other word, you can create a "NASDAQ" and "AMEX" directory and simply put the ASCII files in these directories. All these files are going to be automatically added to the unified database by using the first portion of their names as their symbol name and the exchange (directory) name for the category string. |
| "c:\db\[SYM].txt" | Will extract all .txt files and use the first part of the filename to represent the symbol. This is equivalent to "c:\db\*.txt". |
| "c:\db\[SYM]\price.dat" | Will use the directory name as the symbol name. All immediate directory under "db" will be searched for the "price.dat" files. |
| "c:\db\sym_[SYM]_.dat" | Will only take the string between the "sym_" and the ending "_" as the symbol name. |
| "c:\db\??[SYM].*" "c:\db\*\C?[CAT]\[SYM]" | Fields can be concatenated to '?' wildcards. |
| "c:\db\*[CAT]\file.dat" | Although it is technically possible to concatenate with a '*', it is basically useless. The '*' will be ignored and the [CAT] will absorb all the characters. |
In the previous section, we saw how to extract the category using [CAT]. It is possible to divide the category in 3 components extracted at 3 different portion of a path.
Example, someone may organize their ASCII files in an hierarchical structure like follow:
US <DIR>
NASDAQ <DIR>
STOCK <DIR>
...put here all files...
FUND <DIR>
...put here all files...
NYSE <DIR>
STOCK <DIR>
...put here all files...
With this example, the param.location with the value "[CATC]\[CATX\[CATT]\*" will include all the files in these subdirectories in one of the following unified database category: "US.NASDAQ.STOCK", "US.NASDAQ.FUND" and "US.NYSE.STOCK".
| Additional fields for 'param.location' | |
|---|---|
| CATC | Extract a string representing the category country. |
| CATX | Extract a string representing the category exchange. |
| CATT | Extract a string representing the category type. |
These 3 fields are concatenated with a '.' to form the category as suggested in the category guideline document: <CATC>.<CATX>.<CATT>.
You do not have to ALWAYS extract all 3 sub-component from the path. The default are: CATC="ZZ", CATX="OTHER", CATT="OTHER". The default can be overridden with the FD_AddDataSource optional parameter (param.country, param.exchange, param.type)
Important: The field [CAT] must not be used when one of the [CATC], [CATX] or [CATT] field is used.
Example:
In this example, someone have two directories with stock from NASDAQ and NYSE. That person wish to add all his local data to an unified database while respecting the category guideline. He should do the following assuming all the files are in a directory "C:\NASDAQ" and "C:\NYSE".
FD_AddDataSourceParam param;
memset( ¶m, 0, sizeof( FD_AddDataSourceParam ) );
param.id = FD_ASCII_FILE;
param.location = "C:\[CATX]\*.TXT";
param.info = FD_DOHLCV;
param.country = "US";
param.type = "STOCK";
FD_AddDataSource( &unifiedDatabase, ¶m );
This will add all the .TXT file in the category "US.NASDAQ.STOCK" and "US.NYSE.STOCK".
![]()
Here are a quick overview of how each FD_AddDataSource parameters are used for an ASCII data source:
'param.id'
Must be FD_ASCII_FILE'param.location'
The path of the files to be included (as explain in previous sections).
'param.info'
The pre-defined format (like FD_DOHLCV) or a custom field string as explained in the previous section.'param.category'
The default category is "ZZ.OTHER.OTHER", unless it is override by this parameter. This parameter is ignored if one of the category field is used: [CAT], [CATC], [CATX] or [CATT]'param.country', 'param.exchange', 'param.type'
Allows to redefine the default value for the [CATC], [CATX] or [CATT] field respectively.'param.username', 'param.password'
Unused. Must be NULL.'param.symbol'
Unused. Must be NULL.'param.period'
Unused. Must be NULL.'param.flags'
For the time being the ASCII data source are read-only and does not offer more advanced features. That parameter shall be FD_NO_FLAGS. On the other side, expect on short term the ASCII file format to be among the first to allows write/update capability.
|