Explaining the software part of the system
Putting Together And Topping-Up The Dataset
- Acquisition of ready datasets, typically in unfriendly formats -> use of own ad-hoc developed software and tools to input such data into the system
- Automatic parsing and scraping of available official data (Eg. online corporate registers)
- Operator-assisted input of data from sources with database-unfriendly formats (Eg. PDF files, official publications in electronic forms)
- Automatic collection of large volumes of potentially relevant data (anything on the internet; court databases; regulatory databases etc), pre-structuring it for easier review by analysts -> Analyst review -> Manual input of valuable bits
Teaching The Databank Valuable Knowledge
Contributor introduces a ’rule’ (Eg. there is a big chance all Cyprus companies having Mr. So-and-so as a director are beneficially owned or at least are somehow related to Mr. X) -> Developer breaks it down into elements and translates into algorithm -> Algo is fed to the database which then looks for records which match the ’rule’ thus enriching the matching records -> Automatic application of the ’rule’ is repeated at certain intervals, as the body of the dataset slightly changes all the time
Same process for introduction of other similar knowledge
Any query to the database which returns an observable pattern gets analyzed for the potential of exploring the pattern and feeding it in a more developed and researched shape back into the database. Expert knowledge is used to do this.
- Developing and constantly fine-tuning transliteration and reverse transliteration modules for languages in which our data originates
- Creating modules for data clean-up
- Researching, debating and implementing algorithms to deal with typical mistakes, mis-spels, wrong transliteration and other data defects
Dealing With Difficult Data And ’Rules’
To address the need for storing and working over large volumes of heterogeneous and unstructured data we have developed a high-level scalable DBMS (database management system) which is a universal extension to the SQL-based DBMS (MySQL, MSSQL, Postgre SQL etc.) Its architecture allows to store data in 3-dimensional format where any record may have meta-information attached, and multiple formats for the stored data are supported.
To simplify querying the database we have developed a user-friendly language IQL (Intelligent Query Language) whose syntax is similar to SQL languages and which works with the structural elements of the database (not to be confused with two-dimensional tables with which SQL works).
Polish notation was used to describe logical operations by which data selection is made.
A neural network implemented
To find hidden patterns in our dataset and mine its heterogenous data we are developing a five-layer neuro-fuzzy network built on the Mamdani fuzzy inference system.
The network is being adapted to work with corporate registries and similar datasets. It combines the methods of artificial neural networks and of systems based on fuzzy logic. This gives a synergetic effect through the combination of human-like reasoning style of fuzzy systems with the learning and the connectionist structure of neural networks.
The system is focused on interpretability and Linguistic Fuzzy Modeling.
The network is capable of learning with the teacher and on its own. The teachers work closely with a group of lawyers and investigators who formulate tasks and evaluate the adequacy of the system’s findings.
Currently the main application of the neural network is formulating assumptions as to what elements of the database are controlled or under the same control of a given individual or business group or criminal group.
These findings supplement the answers to the same question derived from application of two different analysis tools described above: a straight analysis of links in the database and the search for elements in the database which have same or similar properties as the element being analyzed.