Table of Contents
searchbox is a complete toolbox that let you setup in a very few time a complete content gathering system. A content gathering system is different from a standard search engine essentially because it can gather information from a lot of different source types and not only from the Web. This feature let a content gathering system to be useful for enterprise and web search.
searchbox adds some other proprietary features about metadata management. Trough its Rendering module searchbox is able to automatically associate metadata to single documents and use them as retrieval handles. Metadata can also added offline in a structured way using the Metadata template approach.
searchbox also provide an interesting set of publishing options. Other than the standard query language is able to perform a multiformat notification of new gathered creating specific channels.
The following picture shows an overall functional schema of searchbox features.
Gathering. searchbox use some basic concepts to manage gathering from local and remote information sources. The configuration of gathering agets is usually done once when a new Source has to be added to the gathering system or its behaviours are changed.
Rendering. This phase heavily depends from feature extractors that are installed into your searchbox instance. In general terms the rendering action is automatically performed by searchbox every time a new document is fetched. At rendering time all feature extractors are applied to the document and the information produced by them is injected into the document itself as structured metadata.
Editing. Once contents are indexed and/or locally cached, users can inject structured metadata using the standard SOAP interface. Each user can have a pre-configured library of fixed or parametric metadata to apply to single documents. Metadata added in this way can be used into queries exactly as other metadata.
Publishing. searchbox is an "always on" system: once a new document is fetched it is immediately rendered and published to final users. searchbox provide both POP and PUSH publishing feature through its query and notification interface. During searchbox uptime all publishing services are always available .
From technological point of view, searchbox is a very sophisticated software but that can be easily configured by any user through few and powerful basic concepts.
It is a place where information resides. A source can be thematic on a specific topic or completely generic with information regarding many topics at the same time. If you need to find a sport news a sport newspaper is for sure a good candidate Source but if you need information about health probably the best choice is a medical magazine. A library is also a good information source even if usually is not digital, on the contrary the hard disk of your Personal Computer or a Compact Disc are also information sources but in digital format. A source is called Digital Source when a computer can acquire information from it using a specific protocol.
searchbox is able to acquire information from many types of digital sources, it only need to know some information about the source itself like the physical address, the access protocol and the user identification credentials if needed.
The types of Sources that searchbox can manage are many and enough to outline a very interesting applicative context.
The SB "range of action" starts from the server where SB is installed and arrives to every reachable resource. In the first layer there are local resources like the email folders and local disks. Depending from the access privileges granted to the system, SB is able to configure all these local resources as Sources.
Local network resources can be configured by SB as Sources too. Also in this other case the suitable access privileges must be granted.
The most "remote" type of accessible resources reachable by SB are the Internet ones. SB can natively access to Web sites, newsgroup servers, Web Services, RSS stream, etc. so that the concept of Source can be generalized as "any physically reachable resource known by its URL".
The rate of new information we can gather from an information source defines how such source is dynamic. A novel is a static information source, an encyclopedia is more dynamic because its annual update, finally a newspaper is a very dynamic due to its daily editions. The concept of "edition" is not strongly applicable to the case of digital sources where often there is a continuous stream of news; the electronic version of newspapers available on the Web are real-time updated as soon news come to the editorial office.
searchbox can record the history of all the information produced by a Source. For every Source we can create a multi-level cache where each level represents a time-slice of the Source. All the search capabilities of searchbox are effective on this type of Archives.
Figure 5.3. The three dimensional structure of a searchbox Digital Source: the (x,y) space and its time evolution (t)

Due to its multi level caching system searchbox is the ideal tool to implement "mining" and dynamic content monitoring applications.
It aggregates many information Sources (and their Archives) in a unique access point. The Collection supports a standard query language that combines keywords and attributes with standard AND, OR, NOT, NEAR operators as in many other Internet search engines.
Using a Collection object is possible to create a specialized index from contents coming only from selected information Sources.
This last basic concept let searchbox to be used both as a powerful content monitoring tool and as news channel.
Using a persistent query the Watch is a view on a Collection. Every time the Watch is shown it produces the list of the newest information contained in the corresponding Archives, filtered by the persistent query and timestamp ranked. The output of the Watch can be considered like a "press review" of the most recent interesting news coming from some information Sources.
This type of object has been introduced starting from searchbox release 2.2.x. A metadata template is a set of metadata that can be added to every single document by a SOAP call by an external application (also by the Control Panel itself). These metadata are structurally different from those applied by plugins because they must be previously defined and can be also retracted from the document at any time. Two types of metadata template are available:
Static. All metadata values of template are defined at configuration time.
Parametric. Almost one of metadata value is a parameter that must be specified when the set of metadata referred by the template is applied.
Metadata templates, as any other components of searchbox, follow visibility rules imposed by the ACL system so that different users can have their own private metadata libraries.
Specific function can be implemented into searchbox using the Plugin System. searchbox can be extended in three different way:
More gathering protocols. By default searchbox many of the most widespread types of document sources (Web sites, ftp, pop/imap, webdav, etc.) but it is possible to write a custom gathering agent as software plugin. This type of plugin is available starting from 2.1.x release.
More document formats. This other type of plugin is devoted to implement parsers for specific document formats not natively supported by searchbox. This type of plugin is available starting from 2.1.x release.
Custom feature extraction modules. At gathering time each document is analyzed by a set of modules organized as a pipeline. Each module is specialized to recognize a specific feature of documents and once such a feature is detected a specific metadata is added to the document itself in order to be indexed. Some basic feature extractors are included by default into the searchbox package.
All plugins must be written as native DLL for the target platform (see plugin documentation in the Programmer's Guide).
The following picture shows how the above concepts are connected each to the others.
They organized in three different groups:
Gathering group. The source is the central gathering concept. The goal of a source is that of grouping a certain number of seeds and to configure a suitable access protocol. A seed can be a database table or a Web page, a complete Web site, a specific portion of the Web or a fully custom data repository. The source natively supports access to seeds with digital certificate, password, cookies etc. A seed can be shared by many sources.
Indexing group. An archive represents an index and a repository of contents coming from a given source. Multiple archives can be connected to a single source, since every archive can have different configuration rules for its creation (i.e. caching or not, different access credentials for different users, etc.). Finally, in order to create indexes from different kinds of sources, many archives can be grouped together to form a collection. The collection represents a way to aggregate sources that are heterogeneous from the point of view of seeds, but that are homogeneous in terms of topic (i.e. all the Italian newspapers). Both archives and collections are query-able objects.
Monitoring group. searchbox can also be used as a monitoring tool. Watches contain a set of static filters on the information stream coming from a collection. A watch can be used to implement a customized view on any information stream originated from a group of dynamic sources through the corresponding collections. Watches support subscriptions from any client application that needs to be alerted as soon a specific condition is matched.
Gathering data from original sources is one of the main problems in digital content integration and delivery. A very typical scenario is when you have to gather information from many, heterogeneous digital sources that are geographically distributed too. Owners of such digitals sources are focused on their original mission of content production and usually do not provide a standard way to access their archives by the means of other applications. This situation is due to many factors but it is easily understandable that information is the main value of a content provider and he/she desires strict control on how it is delivered. As results of this status in many cases content providers do not really care whether the user wants to use other applications to access their information through standard protocols and formats. This situation is not the ideal one from the point of view of the user who has many content providers to interface with because he/she is forced to setup and maintain a custom communication channel with each. Such channels are characterized by custom user interfaces and are often very hard to be integrated in other applications. A possible solution to this kind of problems comes from a custom declination of the approach that is currently used by Search Engines for Web plus the Web Service technology.
Web Search Engines cannot influence in any way how web sites publish their information so that if an engine wants to build an index of the content provided by some site it must access the web site on his own. The method used by Search Engines to accomplish that task is called “crawling” or “spidering”. A web crawler is a software agent that simulates a real user accessing a web site and read all the information contained in it. In order to succeed with this task a crawler must have a toolbox with any possible “adapter” able to match all access protocols and document formats available on the web. After not so many years after its birth, the WWW begun to support other protocols than the original HTTP and many other document formats other than HTML. Formats like PDF, DOC, Flash and protocols like NNTP, FTP and ODBC (some of which actually predate the HTTP over HTML web standard medium) forced Web crawlers to adapt themselves to the new situation. The basic assumption of a typical Web crawler is that any information source must be treated like a “black box” with no way to contact the webmaster to ask him/her to adapt content for a specific usage. From the Web source point of view a Web crawler is like any other normal user that visits the site. This particular approach is very powerful because it has zero organizational and technical impact on the information sources and for this reason it has been successfully adopted in the enterprise environment too. In any large company or public administration the goal of aggregating content from different and heterogeneous sources (even if they are located and managed by the company itself), is really hard to be accomplished. Exporting data from an existing database means that either or both the organizations providing and using the content has to obtain the necessary authorizations, writing some software and thus allocate some human resources. All those reasons are serious potential point of failure for any content integration project. In this type of scenario a crawling technology can enormously simplify the integration task because the crawler acts exactly like any other authorized user whose accessing procedures are already defined and accepted by all departments of any company.
An interesting way to visualize the content gathering problem is to imagine that in order to acquire information we have to setup a channel connecting the content provider and the users. Using the already discussed “search engine” approach a possible solution is to create a system able to aggregate many different information sources and provide some standard application services to access it. In this way users will only need to know the standard application interface provided by the gathering system.
Figure 5.7. The Content Gathering component as bridge between content providers and client application

At the left side of the above picture the heterogeneous world of content provider is sketched. Different shapes represent the different protocols and formats used to access to the content. At the opposite side there is a structured repository that needs to be filled from contents coming from content provider. The middle component is the content gathering module which choose the right adapter to gather information from any content provider and exposes some standard services:
Is able to retrieve any piece of information in the repository through a query composed by words or metadata separated by the AND, OR, NOT and NEAR operators typical of any search engine. The indexing is implemented using a full dynamic indexing service in order to take in account when a new content is added to the repository. No index rebuild is needed.
Used to automatically feed newly acquired contents through a channel. A very common standard like RSS can be used for this purpose.
Generates events to notify that something is changed in the repository. Alerting methods use email messages, Instant Messaging, SMS and Web Service calls.
The above services can be used by a client side component to build any kind of structured object based on the original “raw” information gathered from content providers. Obviously any type of structure provided by the content provider itself will be preserved and indexed too.
From the searchbox point of view a just fetched document is a completely opaque item, a binary object that must be properly threated in order to let users to retrieve it later using its specific features.
A good representation of a group of fetched document to be processed by searchbox is shown here.
A group of spheres with very smooth surfaces with no possibility to distinguish one from each other.
The rendering process consists in analysing the content of any document and reveal all possible feature of it.
Spheres of our example now, after the rendering process, have some "handles". Such handles are unique for each document but must belong to a specific kind of feature. Usually the set of feature that can be extracted from a document depends from the type of the documents itself. A very basic type of feature that can be extracted from text documents is the "list of words" used to implement full-text indexing. For other types of documents like pictures, videos, etc. other possible types can be: "author", "duration", "category" and so on.
searchbox implements by defaults only some of those feature extractors (i.e. the "list of words" extractor for full-text search) and a special plug-in system that accepts custom processing modules for specific documents formats.
Once our documents have their handles revealed, the searchbox query engine can easily use them to retrieve documents that share one or more shared characteristic.
The above picture is a visual representation of the handle-based retrieval model performed by searchbox. Each ring is a query and contains a chain of spheres (documents) that share a specific feature (i.e. all documents that contains the word "computer" or all documents that are videos and are no longer that one minute).
searchbox deeply differs from a traditional relational DBMS from two main key aspects:
searchbox does not has an internal structured model of data it has to store. It dynamically builds its internal data structure on-the-fly as soon it "see" data for the first time. The searchbox administrator does not need to define how data are structured but only how data can be reached by an external software agent. With this approach searchbox can build its own private view of any information source independently from how data are internally structured inside the source itself.
structured information are attached to the single instance of documents both at the fetching time and when offline editors inject specific metadata. In this way the searchbox is particularly suitable for dynamical document repositories with a big turnover.
searchbox also requires a very low administration effort compared with any enterprise level database and it can be usually installed in any existing computing environment in minutes.
Even if searchbox can be used in many applications where database are currently involved its main use should be limited only for those situations where unstructured information must be gathered and/or managed (i.e. document management). In all other cases a standard database usually works well.
searchbox works different from any other search engine on the market. searchbox is able perform a retrieval task using any piece of information that is able to extract from any digital document.
The searchbox action is not limited to a full-text retrieval but it depends from rendering agents that are active on a specific information source. Such agents are organized as a processing pipeline that is applied to every fetched document.
The following picture is an overview of the searchbox rendering process.
The rendering process extracts from the original document a set of features that will be coded in an intermediate internal XML format called FFF (Focuseek Flexible Format). The document processing inside the rendering module is defined by a Plug-In chain where each Plug-In is responsible to extract a specific set of features to be indexed. If no Plug-In of the chain is active the rendering module will only extract the text and its paragraph organization. In this case the searchbox indexing module will perform a simple full-text indexing of the document.
The searchbox indexer is able to index all features that plugins can extract from documents and organize them in with a proprietary structure deeply connected to the rendering process described before. The basic idea is that the rendering module is able to understand the layout structure of a page so it is able to assign to each portion of text a specific role. A typical "role" for e rendered text can be "title", "central text" or "marginal note". A specific weight is assigned to each role so that the ranking applied to the results of a query will depends from the roles that the keywords have into the retrieved pages. A typical important role is title while a less important one is the footnote. searchbox model the situation distributing the content of a page on different layers each corresponding to a specif role and separately indexed.
The searchbox supports this layered architecture so that is composed by 31 different slices. 15 are defined by default both as role and associated weight while the other 16 are customizable. Using custom rendering Plug-Ins such slices can be populated with information extracted from the document or with generated metadata. At query time it is possible to specify which slices we want to use and eventually modify their default weights.
A very powerful and innovative feature proposed by searchbox is the possibility to notify someone or something of the presence of any new information entered in the system. This feature is especially useful when the archives are extremely dynamic and there is a big turnover of information.
The component that implements this feature is the Watch. For every Watch a list of notificator can be configured. A notificator is an endpoint of an external service that is supposed to be listening searchbox messages about new documents satisfying the current Watch configuration.
Possible notificators types are:
RSS stream. Generated by default for all Watch results
EMAIL message. News are contained in a standard email message.
IM message. Like the mail message but send an instant message using all major available instant messaging protocols
SOAP call. All new documents are passed to another Web Service using a specific SOAP call.
With such notification feature it is possible to implement with searchbox a tiny but effective workflow system.
searchbox supports all the most commons document formats: HTML, Microsoft Word, Microsoft Excel, Microsoft PowerPoint, PDF, RTF, Text and Internet mail message (RFC822). Once fetched all documents are transformed in an internal XML format (FFF) with UTF8 representation. An important aspect to notice is that despite searchbox uses an internal XML format for documents, it is not an XML database and does not support any XML query standard.
The maximum size for a single fetched document must not exceed 16MB, if it happen the extracted text will be truncated for some document formats (i.e. HTML) or null for others (i.e. MS Word).
At this moment the supported MIME types are:
| Document type | MIME type | Notes |
|---|---|---|
| ASCII text | text/plain | All words contained in documents are indexed ignoring line endings. A short line is considered to be a paragraph break. |
| HTML | text/html | HTML is supported in all versions up to HTML 4.01, XHTML is supported in all 1.x versions. The HTML parser is generally very robust with relation to malformed or invalid HTML. Images, style sheets, style tags and javascript are ignored. Framesets are supported as a source for links to the framed pages. Client side imagemaps are supported as a source for links. Link generated by javascript or javascript document location changes are not supported. |
| application/pdf | PDF documents are supported in all versions up to 1.5 (Acrobat 6). Encrypted PDFs are not supported. Some PDF generators don't emit enough information to extract all the contained text as it appears on page, and PDFs with complex multicolumn layouts might result in text being extracted with a different paragraph or sentence ordering compared to the visual layout. Page boundaries are ignored. | |
| Microsoft Word (Windows and Mac) | application/msword | Microsoft Word files are supported for files generated by Word 97, Word 98 (Mac), Word 2000, Word XP and Word v.X (Mac), files generated by versions of Word previous to Word 97 or Word 98 (Mac) are currently not supported. Page boundaries are ignored. Embedded documents are currently not supported. |
| Microsoft Excel (Windows and Mac) | application/vnd.ms-excel | All text contained in the sheets. Every cell is considered as a new paragraph. |
| Microsoft PowerPoint (Windows and Mac) | application/vnd.ms-powerpoint | All text contained in all slides considered as raw sequence of paragraphs. |
| Rich Text Format (RTF) | text/rtf | RTF files are supported in all versions up to RTF 1.6. Page boundaries are ignored |
| email message formatted as RFC822 | message/rfc822 | Message files are supported in their raw form, as they are transmitted among mail servers, returned by POP3 servers, stored in the Unix maildir format and in Microsoft Outlook .eml files. The Unix mbox format, the Netscape mail format and Qualcomm Eudora's mail format are composed of a sequence of RFC822 messages, and can be imported after splitting the mailbox in the single messages. Message file text is imported entirely, and any attached message, or document in any of the supported formats is imported recursively. The message text and all the recognized attachments are indexed as a single document. |
In the following table all charsets supported by searchbox with aliases:
InternalConverter Name | UTR22 | IBM | WINDOWS | JAVA | IANA | MIME | Untagged Aliases |
ibm-1208 ibm-1209 ibm-5304 ibm-5305 ibm-13496 ibm-13497 ibm-17592 ibm-17593 | windows-65001UTF-8 | UTF-8 | UTF-8 | UTF-8 | cp1208 | ||
| ibm-1204 ibm-1205 |
| UTF-16 | UTF-16 ISO-10646-UCS-2 | UTF-16 | unicodecsUnicodeucs-2 | |
| ibm-1200 ibm-1201 ibm-13488 ibm-13489 ibm-17584 ibm-17585 ibm-21680 ibm-21681 ibm-61955 ibm-61956 | windows-1201 | UTF-16BEx-utf-16be | UTF-16BE | UTF-16BE | cp1200cp1201UTF16_BigEndian | |
| ibm-1202 ibm-1203 ibm-13490 ibm-13491 ibm-17586 ibm-17587 ibm-21682 ibm-21683 | windows-1200 | UTF-16LEx-utf-16le | UTF-16LE | UTF-16LE | UTF16_LittleEndian | |
| ibm-1236ibm-1237 |
|
| UTF-32 ISO-10646-UCS-4 | UTF-32 | csUCS4ucs-4 | |
| ibm-1232ibm-1233 |
|
| UTF-32BE |
| UTF32_BigEndian | |
| ibm-1234ibm-1235 |
|
| UTF-32LE |
| UTF32_LittleEndian | |
|
|
|
|
|
| UTF16_PlatformEndian | |
|
|
|
|
|
| UTF16_OppositeEndian | |
|
|
|
|
|
| UTF32_PlatformEndian | |
|
|
|
|
|
| UTF32_OppositeEndian | |
|
| windows-65000UTF-7 |
| UTF-7 | UTF-7 |
| |
|
|
|
|
|
| IMAP-mailbox-name | |
| ibm-1212ibm-1213 |
|
| SCSU |
|
| |
| ibm-1214ibm-1215 |
|
| BOCU-1 csBOCU-1 |
|
| |
| ibm-9400 |
|
| CESU-8 |
|
| |
| ibm-819 |
| ISO-8859-1 ibm-819 cp819 latin18859_1 csISOLatin1iso-ir-100 ISO_8859-1:1987 l1819 | ISO_8859-1:1987 ISO-8859-1 IBM819 cp819 latin1 csISOLatin1 iso-ir-100 l1 | ISO-8859-1 |
| |
|
| windows-20127 US-ASCII ASCII ANSI_X3.4-1968 ANSI_X3.4-1986 ISO_646.irv:1991 ISO646-US csASCIIcp367 | ASCIIUS-ASCII iso_646.irv:1983 ISO646-US ascii7646 | ANSI_X3.4-1968 US-ASCII ASCII ANSI_X3.4-1986 ISO_646.irv:1991 ISO646-US us csASCII iso-ir-6cp367 | US-ASCII |
| |
| ibm-1392 | windows-54936 |
| gb18030 |
|
| |
ibm-367_P100-1995 | ibm-367 |
|
| IBM367 |
|
| |
ibm-912_P100-1995 | ibm-912 | windows-28592 iso-8859-2 ISO_8859-2:1987 latin2 csISOLatin2 iso-ir-101 l2 | iso-8859-2 ibm-912ISO_8859-2:1987 latin2cs ISOLatin2 iso-ir-101 l28859_2 cp912912 | ISO_8859-2:1987 iso-8859-2 latin2 csISOLatin2 iso-ir-101 l2 | iso-8859-2 |
| |
ibm-913_P100-2000 | ibm-913 | windows-28593 iso-8859-3 ISO_8859-3:1988 latin3cs ISOLatin3 iso-ir-109 l3 | iso-8859-3 ibm-913ISO_8859-3:1988 latin3 iso-ir-109 l38859_3 cp913913 | ISO_8859-3:1988 iso-8859-3 latin3 csISOLatin3 iso-ir-109 l3 | iso-8859-3 |
| |
ibm-914_P100-1995 | ibm-914 | windows-28594 iso-8859-4 latin4cs ISOLatin4 iso-ir-110 ISO_8859-4:1988 l4 | iso-8859-4 ibm-914latin4 csISOLatin4 iso-ir-110 ISO_8859-4:1988 l48859_4 cp914914 | ISO_8859-4:1988 iso-8859-4 latin4 csISOLatin4 iso-ir-110 l4 | iso-8859-4 |
| |
ibm-915_P100-1995 | ibm-915 | windows-28595 iso-8859-5 cyrilliccs ISOLatin Cyrillic iso-ir-144 ISO_8859-5:1988 | iso-8859-5 ibm-915cyrillic csISOLatinCyrillic iso-ir-144 ISO_8859-5:19888859_5 cp915915 | ISO_8859-5:1988 iso-8859-5 cyrillic csISOLatinCyrillic iso-ir-144 | iso-8859-5 |
| |
ibm-1089_P100-1995 | ibm-1089 | windows-28596 iso-8859-6 arabiccs ISOLatin Arabic iso-ir-127 ISO_8859-6:1987 | iso-8859-6 ibm-1089arabic csISOLatinArabic iso-ir-127 ISO_8859-6:1987 ECMA-114 ASMO-7088859_6 cp10891089 | ISO_8859-6:1987 iso-8859-6 arabic csISOLatinArabic iso-ir-127ECMA-114ASMO-708 ISO-8859-6-I ISO-8859-6-E | iso-8859-6ISO-8859-6-IISO-8859-6-E |
| |
ibm-9005_X100-2005 | ibm-9005 | windows-28597 iso-8859-7 greek greek8 ELOT_928 ECMA-118 csISOLatinGreek iso-ir-126 ISO_8859-7:1987 |
| ISO_8859-7:1987 iso-8859-7 greek greek8 ELOT_928 ECMA-118 csISOLatinGreek iso-ir-126 | iso-8859-7 |
| |
ibm-813_P100-1995 | ibm-813 |
| iso-8859-7 ibm-813 greek greek8 ELOT_928 ECMA-118 csISOLatinGreek iso-ir-126 ISO_8859-7:19878859_7 cp813813 |
|
|
| |
ibm-916_P100-1995 | ibm-916 | windows-28598 iso-8859-8 hebrew csISOLatinHebrew iso-ir-138 ISO_8859-8:1988 | iso-8859-8 ibm-916 hebrew csISOLatinHebrew iso-ir-138 ISO_8859-8:19888859_8 cp916916 | ISO_8859-8:1988 iso-8859-8 hebrew csISOLatinHebrew iso-ir-138 ISO-8859-8-IISO-8859-8-E | iso-8859-8ISO-8859-8-IISO-8859-8-E |
| |
ibm-920_P100-1995 | ibm-920 | windows-28599 iso-8859-9latin5 iso-ir-148 ISO_8859-9:1989 l5 | iso-8859-9 ibm-920latin5 csISOLatin5i so-ir-148 l58859_9 cp920920 | ISO_8859-9:1989 iso-8859-9 latin5 csISOLatin5 iso-ir-148l5 | iso-8859-9 | ECMA-128 | |
ibm-921_P100-1995 | ibm-921 |
| iso-8859-138859_13 | iso-8859-13 | iso-8859-13 | cp921921 | |
ibm-923_P100-1998 | ibm-923 | windows-28605 iso-8859-15 Latin-9 l9 | iso-8859-15 ibm-9238859_15 latin0 csisolatin0 csisolatin9 iso8859_15_fdis cp923923 | iso-8859-15Latin-9 | iso-8859-15 |
| |
ibm-942_P12A-1999 | ibm-942 ibm-932 |
|
|
|
| cp932shift_jis78sjis78ibm-942_VSUB_VPUAibm-932_VSUB_VPUA | |
ibm-943_P15A-2003 |
| windows-932 Shift_JISMS_Kanjics ShiftJIScsWindows 31Jx-sjisx-ms-cp932cp932 | cp943c Shift_JISMS_Kanjics ShiftJISwindows-31j csWindows31Jx-sjis | Shift_JISMS_Kanji csShiftJISwindows-31j csWindows31J | Shift_JIS | ibm-943IBM-943Cms932pcksjisibm-943_VSUB_VPUA | |
ibm-943_P130-1999 | ibm-943 |
| cp943 ibm-943943 |
|
| Shift_JISibm-943_VASCII_VSUB_VPUA | |
ibm-33722_P12A-1999 |
| windows-51932 EUC-JP Extended _UNIX _Code _Packed _Format _for _Japanese csEUCPkdFmtJapanese X-EUC-JP | EUC-JP Extended _UNIX _Code _Packed _Format _for _Japanese csEUCPkdFmtJapanese X-EUC-JPeucjis | Extended _UNIX _Code _Packed _Format _for _Japanese EUC-JP csEUCPkdFmtJapanese | EUC-JP | ibm-33722ibm-5050ibm-33722_VPUAIBM-eucJP | |
ibm-33722_P120-1999 | ibm-33722 ibm-5050 |
| cp33722 ibm-33722 33722 |
|
| ibm-33722_VASCII_VPUA | |
ibm-954_P101-2000 | ibm-954 |
|
|
|
| EUC-JP | |
ibm-1373_P100-2002 | ibm-1373 |
|
|
|
| windows-950 | |
windows-950-2000 |
| windows-950 Big5 csBig5 | Big5 | Big5 csBig5 | Big5 | x-big5 | |
ibm-950_P110-1999 | ibm-950 |
| cp950 ibm-950950 |
|
|
| |
macos-2566-10.2 |
|
| Big5-HKSCS big5hk | Big5-HKSCS |
| HKSCS-BIG5 | |
ibm-1375_P100-2003 | ibm-1375 |
| MS950_HKSCS |
|
| Big5-HKSCS | |
ibm-1386_P100-2002 | ibm-1386 |
|
|
|
| cp1386windows-936ibm-1386_VSUB_VPUA | |
windows-936-2000 |
| windows-936GBK | GBKCP936 windows-936 | GBKCP936 MS936 windows-936 |
|
| |
ibm-1383_P110-1999 | ibm-1383 |
| cp1383 ibm-13831383 | GB2312 csGB2312 | GB2312 | EUC-CNibm-eucCNhp15CNibm-1383_VPUA | |
ibm-5478_P100-1995 | ibm-5478 |
|
| GB_2312-80chinese iso-ir-58 csISO58GB231280 |
| gb2312-1980GB2312.1980-0 | |
ibm-964_P110-1999 | ibm-964 |
| cp964 ibm-964964 |
|
| EUC-TWibm-eucTWcns11643ibm-964_VPUA | |
ibm-949_P110-1999 | ibm-949 |
| cp949 ibm-949949 |
|
| ibm-949_VASCII_VSUB_VPUA | |
ibm-949_P11A-1999 |
|
| cp949c |
|
| ibm-949ibm-949_VSUB_VPUA | |
ibm-970_P110-1995 | ibm-970 | windows-51949 EUC-KR csEUCKR | cp970 ibm-970 EUC-KRKS_C_5601-1987 ibm-euc KRKSC_5601 5601 970 | EUC-KRcsEUCKR | EUC-KR | ibm-970_VPUA | |
| ibm-971 |
|
|
|
| ibm-971_P100-1995ibm-971_VPUA | |
ibm-1363_P11B-1998 |
|
|
| KS_C_5601-1987 KS_C_5601-1989 KSC_5601 csKSC56011987 korean iso-ir-149 | KSC_5601 | ibm-13635601cp1363kscwindows-949ibm-1363_VSUB_VPUA | |
ibm-1363_P110-1997 | ibm-1363 |
|
|
|
| ibm-1363_VASCII_VSUB_VPUA | |
windows-949-2000 |
| windows-949KS_C_5601-1987 KS_C_5601-1989 KSC_5601 csKSC56011987 koreaniso-ir-149 | windows-949 ms949 |
|
|
| |
windows-874-2000 |
| windows-874 TIS-620 | windows-874 MS874 |
|
|
| |
ibm-874_P100-1995 | ibm-874 ibm-9066 |
| cp874 ibm-874 TIS-620 tis620.2533 | TIS-620 |
| eucTH | |
ibm-1162_P100-1999 | ibm-1162 |
|
|
|
|
| |
ibm-437_P100-1995 | ibm-437 | windows-437 IBM437 cp437 437 | cp437 IBM437437 csPC8CodePage437 | IBM437 cp437 437 csPC8CodePage437 |
|
| |
ibm-737_P100-1997 | ibm-737 | windows-737 IBM737 | cp737 IBM737737 |
|
|
| |
ibm-775_P100-1996 | ibm-775 | windows-775 IBM775 cp775 | cp775 IBM775775 | IBM775 cp775 csPC775Baltic |
|
| |
ibm-850_P100-1995 | ibm-850 | windows-850 IBM850 cp850 | cp850 IBM850 850 csPC850Multilingual | IBM850 cp850 850 csPC850Multilingual | cp850 |
| |
ibm-851_P100-1995 | ibm-851 |
|
| IBM851 cp851 851 csPC851 | cp851 |
| |
ibm-852_P100-1995 | ibm-852 | windows-852 IBM852 cp852 852 | cp852 IBM852 852 csPCp852 | IBM852 cp852 852 csPCp852 |
|
| |
ibm-855_P100-1995 | ibm-855 |
| cp855 IBM855 csPCp855 | IBM855 cp855 855 csIBM855 |
|
| |
ibm-856_P100-1995 | ibm-856 |
| cp856 IBM856 856 |
| cp856 |
| |
ibm-857_P100-1995 | ibm-857 | windows-857 IBM857 | cp857 IBM857 857 csIBM857 | IBM857 cp857 857c sIBM857 | cp857 |
| |
ibm-858_P100-1997 | ibm-858 |
| cp858 IBM00858 CCSID00858 CP00858 | IBM00858 CCSID00858 CP00858 PC-Multilingual-850+euro | cp858 |
| |
ibm-860_P100-1995 | ibm-860 |
| cp860 IBM860 860 csIBM860 | IBM860 cp860 860 csIBM860 | cp860 |
| |
ibm-861_P100-1995 | ibm-861 | windows-861 IBM861 | cp861 IBM861 861 cp-iscsIBM861 | IBM861 cp861 861 cp-is csIBM861 | cp861 |
| |
ibm-862_P100-1995 | ibm-862 | windows-862 DOS-862 | cp862 IBM862 862 csPC862LatinHebrew | IBM862 cp862 862 csPC862LatinHebrew | cp862 |
| |
ibm-863_P100-1995 | ibm-863 |
| cp863 IBM863 863 csIBM863 | IBM863 cp863 863 csIBM863 | cp863 |
| |
ibm-864_X110-1999 | ibm-864 |
| cp864 IBM864 csIBM864 | IBM864 cp864 csIBM864 | cp864 |
| |
ibm-865_P100-1995 | ibm-865 |
| cp865 IBM865 865 csIBM865 | IBM865 cp865 865 csIBM865 | cp865 |
| |
ibm-866_P100-1995 | ibm-866 | windows-866 cp866 | cp866 IBM866 866 csIBM866 | IBM866 cp866 866 csIBM866 | cp866 |
| |
ibm-867_P100-1998 | ibm-867 |
|
|
|
|
| |
ibm-868_P100-1995 | ibm-868 |
| CP868 IBM868 868 | IBM868 CP868 csIBM868 cp-ar | CP868 |
| |
ibm-869_P100-1995 | ibm-869 | windows-869 IBM869 | cp869 IBM869 869cp-grcs IBM869 | IBM869 cp869 869 cp-gr csIBM869 | cp869 |
| |
ibm-878_P100-1996 | ibm-878 | windows-20866 KOI8-Rkoi8 csKOI8R | KOI8-Rkoi8 csKOI8R | KOI8-R csKOI8R | KOI8-R | cp878 | |
ibm-901_P100-1999 | ibm-901 |
|
|
|
|
| |
ibm-902_P100-1999 | ibm-902 |
|
|
|
|
| |
ibm-922_P100-1999 | ibm-922 |
| cp922 IBM922 922 |
| cp922 |
| |
ibm-1168_P100-2002 | ibm-1168 | windows-21866 KOI8-Ukoi8-ru |
| KOI8-U |
|
| |
ibm-4909_P100-1999 | ibm-4909 |
|
|
|
|
| |
ibm-5346_P100-1998 | ibm-5346 | windows-1250 cp1250 | windows-1250 cp1250 | windows-1250 |
|
| |
ibm-5347_P100-1998 | ibm-5347 | windows-1251 cp1251 | windows-1251 cp1251 | windows-1251 |
|
| |
ibm-5348_P100-1997 | ibm-5348 | windows-1252 | windows-1252 cp1252 | windows-1252 |
|
| |
ibm-5349_P100-1998 | ibm-5349 | windows-1253 | windows-1253 cp1253 | windows-1253 |
|
| |
ibm-5350_P100-1998 | ibm-5350 | windows-1254 | windows-1254 cp1254 | windows-1254 |
|
| |
ibm-9447_P100-2002 | ibm-9447 | windows-1255 | windows-1255 cp1255 | windows-1255 |
|
| |
windows-1256-2000 |
| windows-1256 cp1256 | windows-1256 cp1256 | windows-1256 |
|
| |
ibm-9449_P100-2002 | ibm-9449 | windows-1257 | windows-1257 cp1257 | windows-1257 |
|
| |
ibm-5354_P100-1998 | ibm-5354 | windows-1258 | windows-1258 cp1258 | windows-1258 |
|
| |
ibm-1250_P100-1995 | ibm-1250 |
|
|
|
| windows-1250 | |
ibm-1251_P100-1995 | ibm-1251 |
|
|
|
| windows-1251 | |
ibm-1252_P100-2000 | ibm-1252 |
|
|
|
| windows-1252 | |
ibm-1253_P100-1995 | ibm-1253 |
|
|
|
| windows-1253 | |
ibm-1254_P100-1995 | ibm-1254 |
|
|
|
| windows-1254 | |
ibm-1255_P100-1995 | ibm-1255 |
|
|
|
|
| |
ibm-5351_P100-1998 | ibm-5351 |
|
|
|
| windows-1255 | |
ibm-1256_P110-1997 | ibm-1256 |
|
|
|
|
| |
ibm-5352_P100-1998 | ibm-5352 |
|
|
|
| windows-1256 | |
ibm-1257_P100-1995 | ibm-1257 |
|
|
|
|
| |
ibm-5353_P100-1998 | ibm-5353 |
|
|
|
| windows-1257 | |
ibm-1258_P100-1997 | ibm-1258 |
|
|
|
| windows-1258 | |
macos-0_2-10.2 |
| windows-10000 macintosh |
| macintoshmaccsMacintosh | macintosh |
| |
macos-6-10.2 |
| windows-10006x-mac-greek |
|
| x-mac-greek | macgr | |
macos-7_3-10.2 |
| windows-10007x-mac-cyrillic |
|
| x-mac-cyrillic | maccy | |
macos-29-10.2 |
| windows-10029x-mac-ce |
|
| x-mac-centraleurroman | macce | |
macos-35-10.2 |
| windows-10081x-mac-turkish |
|
| x-mac-turkish | mactr | |
ibm-1051_P100-1995 | ibm-1051 |
|
| hp-roman8roman8r8csHPRoman8 |
|
| |
ibm-1276_P100-1995 | ibm-1276 |
|
| Adobe-Standard-EncodingcsAdobeStandardEncoding |
|
| |
ibm-1006_P100-1995 | ibm-1006 |
| cp1006 IBM1006 1006 |
|
|
| |
ibm-1098_P100-1995 | ibm-1098 |
| cp1098 IBM1098 1098 |
|
|
| |
ibm-1124_P100-1996 | ibm-1124 |
| cp1124 ibm-1124 1124 |
|
|
| |
ibm-1125_P100-1997 | ibm-1125 |
|
|
|
| cp1125 | |
ibm-1129_P100-1997 | ibm-1129 |
|
|
|
|
| |
ibm-1131_P100-1997 | ibm-1131 |
|
|
|
| cp1131 | |
ibm-1133_P100-1997 | ibm-1133 |
|
|
|
|
| |
|
|
| ISO-2022-JP csISO2022JP | ISO-2022-JP | ISO-2022-JP | ISO_2022,locale=ja,version=0 | |
|
|
|
| JIS_Encoding |
| ISO_2022,locale=ja,version=1ISO-2022-JP-1JIS | |
|
|
|
| ISO-2022-JP-2 | ISO-2022-JP-2 | ISO_2022,locale=ja,version=2csISO2022JP2 | |
|
|
|
|
|
| ISO_2022,locale=ja,version=3JIS7csJISEncoding | |
|
|
|
|
|
| ISO_2022,locale=ja,version=4JIS8 | |
|
|
| ISO-2022-KR csISO2022KR | ISO-2022-KR | ISO-2022-KR | ISO_2022,locale=ko,version=0 | |
|
|
|
|
|
| ISO_2022,locale=ko,version=1ibm-25546 | |
|
|
| ISO-2022-CN csISO2022CN | ISO-2022-CN | ISO-2022-CN | ISO_2022,locale=zh,version=0 | |
|
|
|
| ISO-2022-CN-EXT | ISO-2022-CN-EXT | ISO_2022,locale=zh,version=1 | |
|
|
|
| HZ-GB-2312 | HZ-GB-2312 | HZ | |
ibm-897_P100-1995 | ibm-897 |
|
| JIS_X0201X0201csHalfWidthKatakana |
|
| |
|
| windows-57002x-iscii-de |
|
|
| ISCII,version=0iscii-dev | |
|
| windows-57003x-iscii-be windows-57006x-iscii-as |
|
|
| ISCII,version=1iscii-bng | |
|
| windows-57011x-iscii-pa |
|
|
| ISCII,version=2iscii-gur | |
|
| windows-57010x-iscii-gu |
|
|
| ISCII,version=3iscii-guj | |
|
| windows-57007x-iscii-or |
|
|
| ISCII,version=4iscii-ori | |
|
| windows-57004x-iscii-ta |
|
|
| ISCII,version=5iscii-tml | |
|
| windows-57005x-iscii-te |
|
|
| ISCII,version=6iscii-tlg | |
|
| windows-57008x-iscii-ka |
|
|
| ISCII,version=7iscii-knd | |
|
| windows-57009x-iscii-ma |
|
|
| ISCII,version=8iscii-mlm | |
| ibm-65025 |
|
|
|
| LMBCS-1lmbcs | |
|
|
|
|
|
| LMBCS-2 | |
|
|
|
|
|
| LMBCS-3 | |
|
|
|
|
|
| LMBCS-4 | |
|
|
|
|
|
| LMBCS-5 | |
|
|
|
|
|
| LMBCS-6 | |
|
|
|
|
|
| LMBCS-8 | |
|
|
|
|
|
| LMBCS-11 | |
|
|
|
|
|
| LMBCS-16 | |
|
|
|
|
|
| LMBCS-17 | |
|
|
|
|
|
| LMBCS-18 | |
|
|
|
|
|
| LMBCS-19 | |
ibm-37_P100-1995 | ibm-37 |
| cp037 IBM037 ebcdic-cp-us ebcdic-cp-ca ebcdic-cp-wt ebcdic-cp-nl csIBM037037 cpibm37 | IBM037ebcdic-cp-usebcdic-cp-caebcdic-cp-wtebcdic-cp-nlcsIBM037 |
| ibm-037cp37 | |
ibm-273_P100-1995 | ibm-273 |
| CP273 IBM273273 | IBM273CP273csIBM273 |
| ebcdic-de | |
ibm-277_P100-1995 | ibm-277 |
| cp277 IBM277277 | IBM277EBCDIC-CP-DKEBCDIC-CP-NOcsIBM277 |
| ebcdic-dk | |
ibm-278_P100-1995 | ibm-278 |
| cp278 IBM278 ebcdic-sv278 | IBM278ebcdic-cp-fiebcdic-cp-secsIBM278 |
|
| |
ibm-280_P100-1995 | ibm-280 |
| CP280 IBM280 280 | IBM280CP280ebcdic-cp-itcsIBM280 |
|
| |
ibm-284_P100-1995 | ibm-284 |
| CP284 IBM284 cpibm284 284 | IBM284CP284ebcdic-cp-escsIBM284 |
|
| |
ibm-285_P100-1995 | ibm-285 |
| CP285 IBM285 cpibm285 ebcdic-gb285 | IBM285CP285ebcdic-cp-gbcsIBM285 |
|
| |
ibm-290_P100-1995 | ibm-290 |
|
| IBM290cp290EBCDIC-JP-kanacsIBM290 |
|
| |
ibm-297_P100-1995 | ibm-297 |
| cp297 IBM297 cpibm297 297 | IBM297cp297ebcdic-cp-frcsIBM297 |
|
| |
ibm-420_X120-1999 | ibm-420 |
| cp420 IBM420 420 | IBM420cp420ebcdic-cp-ar1csIBM420 |
|
| |
ibm-424_P100-1995 | ibm-424 |
| cp424 IBM424 424 | IBM424cp424ebcdic-cp-hecsIBM424 |
|
| |
ibm-500_P100-1995 | ibm-500 |
| CP500 IBM500 | IBM500CP500ebcdic-cp-becsIBM500ebcdic-cp-ch |
| 500 | |
ibm-803_P100-1999 | ibm-803 |
|
|
|
| cp803 | |
ibm-838_P100-1995 | ibm-838 ibm-9030 |
| cp838 IBM838 IBM-Thai838 | IBM-ThaicsIBMThai |
|
| |
ibm-870_P100-1995 | ibm-870 |
| CP870 IBM870 | IBM870CP870ebcdic-cp-roeceebcdic-cp-yucsIBM870 |
|
| |
ibm-871_P100-1995 | ibm-871 |
| CP871 IBM871 ebcdic-cp-iscs IBM871 ebcdic-is871 | IBM871ebcdic-cp-iscsIBM871CP871 |
|
| |
ibm-875_P100-1995 | ibm-875 |
| cp875 IBM875 875 |
|
|
| |
ibm-918_P100-1995 | ibm-918 |
| CP918 IBM918 | IBM918CP918ebcdic-cp-ar2csIBM918 |
|
| |
ibm-930_P120-1999 | ibm-930 ibm-5026 |
| cp930 IBM930 930 |
|
|
| |
ibm-933_P110-1995 | ibm-933 |
| cp933 ibm-933 933 |
|
|
| |
ibm-935_P110-1999 | ibm-935 |
| cp935 ibm-935 935 |
|
|
| |
ibm-937_P110-1999 | ibm-937 |
| cp937 ibm-937 937 |
|
|
| |
ibm-939_P120-1999 | ibm-939 ibm-931 ibm-5035 |
| cp939 IBM939 939 |
|
|
| |
ibm-1025_P100-1995 | ibm-1025 |
| cp1025 ibm-1025 1025 |
|
|
| |
ibm-1026_P100-1995 | ibm-1026 |
| CP1026 IBM1026 1026 | IBM1026CP1026csIBM1026 |
|
| |
ibm-1047_P100-1995 | ibm-1047 |
| cp1047 IBM1047 1047 | IBM1047 |
|
| |
ibm-1097_P100-1995 | ibm-1097 |
| cp1097 ibm-1097 1097 |
|
|
| |
ibm-1112_P100-1995 | ibm-1112 |
| cp1112 ibm-1112 1112 |
|
|
| |
ibm-1122_P100-1999 | ibm-1122 |
| cp1122 ibm-1122 1122 |
|
|
| |
ibm-1123_P100-1995 | ibm-1123 |
| cp1123 ibm-1123 1123 |
|
|
| |
ibm-1130_P100-1997 | ibm-1130 |
|
|
|
|
| |
ibm-1132_P100-1998 | ibm-1132 |
|
|
|
|
| |
ibm-1137_P100-1999 | ibm-1137 |
|
|
|
|
| |
ibm-1140_P100-1997 | ibm-1140 |
| cp1140 IBM01140 CCSID01140 CP01140 | IBM01140CCSID01140CP01140ebcdic-us-37+euro |
|
| |
ibm-1141_P100-1997 | ibm-1141 |
| cp1141 IBM01141 CCSID01141 CP01141 | IBM01141CCSID01141CP01141ebcdic-de-273+euro |
|
| |
ibm-1142_P100-1997 | ibm-1142 |
| cp1142 IBM01142 CCSID01142 CP01142 | IBM01142CCSID01142CP01142ebcdic-dk-277+euroebcdic-no-277+euro |
|
| |
ibm-1143_P100-1997 | ibm-1143 |
| cp1143 IBM01143 CCSID01143 CP01143 | IBM01143CCSID01143CP01143ebcdic-fi-278+euroebcdic-se-278+euro |
|
| |
ibm-1144_P100-1997 | ibm-1144 |
| cp1144 IBM01144 CCSID01144 CP01144 | IBM01144CCSID01144CP01144ebcdic-it-280+euro |
|
| |
ibm-1145_P100-1997 | ibm-1145 |
| cp1145 IBM01145 CCSID01145 CP01145 | IBM01145CCSID01145CP01145ebcdic-es-284+euro |
|
| |
ibm-1146_P100-1997 | ibm-1146 |
| cp1146 IBM01146 CCSID01146 CP01146 | IBM01146CCSID01146CP01146ebcdic-gb-285+euro |
|
| |
ibm-1147_P100-1997 | ibm-1147 |
| cp1147 IBM01147 CCSID01147 CP01147 | IBM01147CCSID01147CP01147ebcdic-fr-297+euro |
|
| |
ibm-1148_P100-1997 | ibm-1148 |
| cp1148 IBM01148 CCSID01148 CP01148 | IBM01148CCSID01148CP01148ebcdic-international-500+euro |
|
| |