top
skin header_logo header - web content extraction made easy skin
  skin
skin skin
Latest News
Feature List
Screenshots
Wrapper Examples
Download

 

 

 

 

 

Welcome to ΔEiXTo!

Getting data from many unstructured web pages, probably in a repetitive fashion with extensive copy-paste operations, is tedious and time consuming. Wouldn't it be nice to specify the content you want from a web page once and then have an application to do the laborious job for you?

ΔEiXTo (or DEiXTo) is a powerful web data extraction tool that is based on the W3C Document Object Model (DOM). It allows users to create highly accurate extraction rules (wrappers), which describe what pieces of data to scrape from a web page. DEiXTo consists of two separate standalone components:

  • GUI DEiXTo, a Windows application (written in Turbo Delphi) implementing a graphical user interface that is used to manage extraction rules (build, test, fine-tune, save and modify), and
  • DEiXTo Executor, a stand-alone extraction rule executor (command line utility written in Perl), that applies extraction rules on the unstructured content of HTML pages and produces stuctured output in a variety of formats.

DEiXTo can contend with a wide range of web sites with high precision and recall, since it provides the user with an arsenal of features aiming at the construction of well-engineered extraction rules.

Web content extracted with DEiXTo can be saved in either RSS, XML or tab delimited text format. Wrappers built by DEiXTo can be scheduled to run automatically and thus provide automated access to resources of interest, saving users a lot of time, energy, and repetitive effort.

 

ÄEiXTo was developed by Kostas Ntonas and Fotis Kokkoras under the supervision of Assistant Professor Nick Bassiliades in the Informatics Department (LPIS Group) of the Aristotle University of Thessaloniki, Greece.

 

 

box header left
A few words about the term ΔEiXTo
skin

DEiXTo is an acronym for Data Extraction Tool.

First of all, Δ is the equivalent of D in Greek. Now, you are propably wondering what is this "i" character all about. Well, in Greek "ΔEIXTO" (pron. dechto) is the imperative form of "point at" which is what the DEiXTo user does inside a browser window when he starts building a DEiXTo extraction rule.

Now you know... ;-)

skin

 

 

Note: If you think that certain content on our site violates copyrights that you own
or represent, then please contact us and we will remove this content.

skin
footer

Valid XHTML 1.0 TransitionalValid CSS!