Central to our technology and business development is the creation and continuous top-up of a large enough data set covering business actors, entities, assets, relationships and activity typologies within the countries of interest. The initial regional focus of the databank is on the former USSR countries and the offshore world; we are currently expanding our knowledge base into new areas.
This job was originally launched over 10 years ago and mainly completed by 2017, through data set acquisitions, automated collection of data via internet, manual and semi-automated workover of unstructured data including media reports.
The databank consists of corporate, legal, official, regulatory, personal and other data obtained from sources which are structured (such as registers) or can be automatically processed with certain effort (Eg. judicial documents). It also includes data derived from absolutely unstructured sources such as various news media. We developed techniques to put the data from such sources into formats that make it available for automated analysis. The challenge is always to extract as many data points as possible to enable deeper analysis in the future.
Besides open records (constantly updated) the databank is being enriched by sources originally only available in non-electronic form as well as data which has never been on public record.
We work closely with experts and insiders knowledgeable of the economies and business circles in focus and translate their expert knowledge and intuitive insights into information layers in our databank.
We ’teach’ the databank the typologies, insider knowledge and expert-defined patterns to be able to link, with a varying degree of certainty, legal structures (including opaque offshore entities) with specific assets, businesses and people.
The core data set now exceeds 15 Tb in size and is growing every second, as new records are automatically added from available online sources, new datasets are acquired and newly processed unstructured data sources become ready for import into the databank.
The ’smart’ engine of the database is made of the constantly updatable and refinable set of ’rules’ (patterns, if-then, cause-effect correlations, characteristics and other pieces of knowledge on how certain elements of data may be related and linked) expressed in the form of algorithms. To formulate the ’rules’ we use experts with insider knowledge in the relevant niches of interest.
The analytical engine of the databank allows to obtain assumptions of possible links of the investigated subject to other elements of data in the data pool (legal entities, nominee owners and officers, trusted intermediaries and assistants, key managers, service providers, assets etc.)
The databank’s architecture can be described as a ’social network in reverse’. Similar to a social network, it is person- and entity-centric. However, it is not the members of the network who add data and voluntarily establish or disclose relationships and links between themselves, but the system’s contributors or the algorithm. Once a person is included into the database, relevant data starts to accumulate and becomes available for cross-referencing and mining.
An important function of the databank is mapping links in the offshore-onshore nexus, when opaque and anonymous structures get linked with specific assets, businesses and people.