Support      Solutions        Blog        News       Contact 
DOWNLOAD   /   DISCOVER   /   TUTORIALS    /   VIDEOS   /   STORE   /   ABOUT
Size Limitations for .xlsx, .csv, or .txt input files

The maximum number of columns (features, variables, etc.) that Explorer CE can import is 655.  This is due to a restriction of the maximum number of columns in the datagrid used in Explorer CE.  There is no limit to the number of observations.   If more than 655 columns exist in the input file, an error message will appear, and the import will be aborted.  

The time required for importing using these input file formats, in order of fastest to slowest is

  1. Comma-delimited (.csv)
  2. Tab-delimited text (.txt)
  3. Excel (.xlsx)
Size Limitations for .accdb (MS Access) input files

MS Access files (.accdb) are limited to 255 columns, which is less than the 655 columns allowable in Explorer's datagrid. 

Data import via MS Access files is much faster than .csv, .txt, or .xlsx (Excel), so users are encouraged to use MS Access for input files when possible. 
Setting Default Preferences when Importing Large .csv, .txt, or .xlsx files

By default, Explorer CE will do the following when importing .csv, .txt, or .xlsx files:

  1. Empty cells which contain both numbers and character strings, which can result in formatting problems.  
  2. Determine if a column contains mostly numbers or character strings.  
  3. For columns determined to be numeric, Explorer CE will determine if less than 10 discrete values (default) are indentified, and if so, designate the feature as categorical (nominal).
  4. If more than 10 unique values of a numeric feature are identified, then the feature will be designated as continuous.
  5. If a numeric feature is determined to only hold zeroes and ones (0,1), then the feature is designated as binary.
  6. For features determined to have 90% or more values that are string characters, the feature is designated as text.
  7. The default missing code is an empty cell, or string value of "" with no spaces.   Empty cells are identified and set to DBNull
  8. Columns (features) consiting of calendar dates, will be designated as text.
  9. Columns whose values are all the same, will be designated as invariant, and will have a scale icon which reveals an error.  Invariant features are also not useable for analyses, since they carry no information.  
The above operations can result in significant overhead when imputting large .csv, .txt, or .xlsx files files.   Therefore, for large files, if there are no text strings, and all of the data are numeric, the above filtering methods can be turned off by using the default settings listed under "large file input" on the default preferences page

MS Access files, on the other hand do not undergo the filtering described above, since they are usually of higher quality and can preserve their feature formats after input to Explorer CE.  Thus, re-setting the default settings for large file input only applies to .csv, .txt, and .xlsx files.
Comma-delimited files are the second-fastest after MS Access files

Once you have set the defualt settings for inputting large .csv, .txt, or .xlsx files, you should prefer using a comma-delimited file format for input (.csv), since, .csv files rank the second fastest in import speed compared with MS Access files.  When using .csv files, be sure to specify .csv in the filename popup window, as illustrated below: