Mission

“Three things cannot long stay hidden:
the sun, the moon and the truth.”
--Buddha

At Deep Discovery, we are in the business of discovering secrets. The secrets that undermine the health of human civilization. Secrets that are hiding in plain sight in the vast ocean of data across the open Internet. All while fiercely defending the privacy of personal data on ordinary people.

In the past five years, intrepid investigative journalists have uncovered the shadowy underbelly of the global financial system in the Panama Papers, identified a hit squad team of Russian assassins, and Vladimir Putin’s secret construction of a multi-billion dollar personal palace, all using public data from the open Web.

Deep Discovery’s own investigations with open data have uncovered a nuclear trafficking network, analysis of tens of terabytes of social media data that identified local officials inciting extremism around the mob attack on the U.S. Capitol, slave labor in gold mines of West Africa, and large-scale counterfeiting of the Euro.

The United Nations estimates that corruption, money laundering, and other forms of illicit finance account for 5% of the global economy, $3.6 trillion each year. This undercuts national and international efforts to solve the world’s most pressing challenges, poverty, inequality, climate change, public health, and undermines the democratic foundations of a healthy society.

Peter Thiel famously said, “We wanted flying cars, but we got 140 characters.” Artificial intelligence is a revolutionary technology, but it is too often being used for squeezing out incremental improvements to consumers and daily businesses tasks, “first world problems”. Now is the time to put the power of AI to use in support of a revolution in the fight against corruption and bolster the democratic freedoms and wellbeing of billions.

At Deep Discovery, our ambition is to build the first social-impact unicorn focused on ending corruption on Earth, thereby improving the lives of billions of citizens. We believe recent innovations in machine learning, deep neural networks, representation learning, graph technologies, deep link analysis, entity and identity resolution, natural language processing, search and the emergence of web scale data make such an audacious mission tractable.

Our first products in development are a network-based financial crime risk scoring and analysis system for Know-Your-Customer (KYC) due diligence in the financial services sector and a sister product for network analysis to support investigative journalism. We are generating crime and corruption risk assessment scores for every single company in the world, and every director and officer of those companies.

Billions of data points in corporate registry data, professional profiles, social network profiles, news articles, legal case filings, shipping data, government contracts, leaks data, and much much more comprise the vast ocean of data on the open web within which the secrets of crime and corruption lie. The tools now exist to systematically mine this data to reveal the truth and restore the balance of power towards healthy democratic societies.

We are on a mission to change the world by piercing the veil of secrecy, creating a better world built upon a foundation of transparency. If you share these values and align to this mission, come partner with us or join our team.

Team

Jeffrey Stein, CEO

Jeffrey Stein, is a serial technology entrepreneur. After earning an MBA from Stanford Business School, he founded Open Data Registry, backed by venture capital from Andreessen-Horowitz, which used graph database technology to provide traceability across complex global supply chains for major consumer goods companies. He then went on to co-found Orbital Insight, which pioneered the application of deep learning and other artificial intelligence technologies to automating the analysis of satellite imagery of Earth. Jeffrey was VP of Business Development, where he provided executive leadership in growing the company from zero to over 100 employees and raising $80 million in venture capital from Sequoia Capital, Google, Lux Capital, Bloomberg, and In-Q-Tel.

Russell Jurney, CTO

Russell Jurney is a machine learning and data engineer with 18 years of experience building apps from data. He joined Jeffrey as co-founder and CTO of Deep Discovery. Previously, he founded Relato, which analyzed markets with its own business graph extracted from the business web. He was also CTO of Archipelo - a venture backed code search company, an early Senior Data Scientist in Product Analytics at LinkedIn and was the first Hadoop Evangelist at Hortonworks. Russell is also the author of four O’Reilly books, including Agile Data Science 2.0. He blogs about open source at Data Syndrome.

Careers

At Deep Discovery we’re passionate about two things:

  • Applied research in AI and representation learning for text, structured data and networks
  • Shipping Product Regularly

Most research at most companies never leaves the lab. We don’t have that problem.

Technical skills in machine learning and visualization are prerequisites for working here but they are not sufficient. You must be passionate about getting the best tools in the users’ hands where they can make the most impact. You need to develop domain expertise. Interested in joining our team? Please send us a resume, and any links to your portfolio of work demonstrating your track record in shipping data-driven products.

To apply email changeworld@deepdiscovery.ai. Please include a LinkedIn profile or resume, link to your Github portfolio and a note explaining why you want to work here.

We are hiring for the following roles:

Machine Learning Engineer Job Description

Business Graph Representation Learning

Deep Discovery is hiring a Machine Learning (ML) Engineer with deep learning and search experience to build a neural network representation of the business graph incorporating text and structured representations. We extract structured information from news and other text to build networks that drive the models that generate risk scores for the customers of banks, which they use when conducting background checks. This is called a Know Your Customer (KYC) system for Anti Money Laundering (AML) and banks use these systems to evaluate the risk of doing business with their clients so they don’t face stiff fines from regulatory agencies.

We are taking a network-centric approach to KYC that evaluates clients in terms of the context in which they do business this involves several machine learning tasks: extracting knowledge graphs from news and other text, entity and identity resolution of the networks we collect about the economy, representation learning on the resulting graphs and their associated documents, building a scoring engine that uses our business graph to create an accurate risk score. Users do not believe predictions without explanations and the cost of errors is high, so the final machine learning component is the most critical: the system must be explainable in terms of the graphs from which we draw conclusions, and we use a graph database and network visualizations to explain our risk scores.

We’re looking for a self-motivated ML Engineer with the background to be productive, that can learn the domain quickly and who can mine the literature and customize algorithms and systems to come up with novel solutions to the problems we face in delivering a product. While being published is good, the most important thing we want in a candidate is a track record of shipping products to real customers. We have data engineers but expect you to be fairly self-supporting in carrying out your work, so generalist skills are important. Candidates without advanced degrees are welcome, experience is education.

The ideal candidate will have:
  • Early-stage startup experience
  • A track record of shipping data-driven products to market
  • Solid Python 3 skills, including object-oriented analysis and design
  • Entity (ER) and identity resolution (IR) skills
  • Natural Language Processing (NLP) skills
  • Working knowledge of various neural network architectures
  • Working knowledge of Graph Neural Networks (GNNs)
  • Published papers in CS, or other quantitative fields

Data Engineer Job Description

Collect, ingest, index, schedule, stream, pipeline, Dev+MLops

Deep Discovery is hiring a Data Engineer to build the infrastructure for acquiring, ingesting, processing, indexing and retrieving the data streams it consumes and that will build our platforms to enable the training, tuning, deployment and monitoring of statistical models behind background checks for banks that our product produces. This is called a Know Your Customer (KYC) system and banks use them to evaluate the risk of doing business with their clients so they don’t face stiff fines from global regulatory agencies as much as $10 billion each year.

We are taking a network-centric approach to KYC that evaluates clients in terms of the context in which they do business. This involves building and operating several systems to perform data engineering tasks:
  • A data collection framework for frequently updating web crawls and APIs
  • A workflow system and batch scheduler to run jobs both large and small
  • A system to manage networks of reliable workers to process real-time streams of data, operations and commands
  • An automation system for machine learning operations to enable feature extraction and selection, model training, selection, and tuning, model deployment and monitoring
  • API services to host models and evaluate them on data on the fly
  • A build system for creating a single, unique and verifiable build of the end to end product with all of its systems from a set of source repositories including it's user interfaces
  • A catalog of custom docker images for performing each task
  • Various databases we use to serve the system and its assets: relational, graph, search
  • A rigorous system of quality assurance testing using the above triggered by continuous integration directly from source code
  • Dashboards through which to use, control and understand the above

The ideal candidate has 5+ years of experience in data engineering or machine learning operations, has early stage startup experience and is excited by the opportunity to define and build the data and machine learning systems driving a mission critical application for the finance industry. Finance experience is a plus. Strong python skills essential. Search experience needed.

Visualization Engineer Job Description

Deep Discovery is hiring a visualization engineer with search experience to lead development of a state of the art interface for background checks for banks. This is called a Know Your Customer (KYC) system and banks use them to evaluate the risk of doing business with their clients so they don’t face stiff fines from regulatory agencies. We are taking a network-centric approach to KYC that evaluates clients in terms of the context in which they do business and are building a user interface to present this information that is carefully optimized around the information needs of the end-user.

Among other uses of data visualization, our UI will use network visualization to present a risk score along with an explanation of how the network around the client contributed to the score. This will enable banks to evaluate clients in terms of the network in which they do business. Before we get to experience the visualization engineer dream of building Palantir 2.0, we have to find the right client to display in the first place. In order to do that, we need a front end developer who has experience building user interfaces for information retrieval. The process of filtering and evaluating evidence for KYC checks is inherently iterative and we will need to carefully refine the results in a way that users love.

The ideal candidate has 5+ years of experience as a front end engineer, 3+ years of experience with data visualization or building dashboards and 1+ years of experience building interfaces to a search engine or some form of information retrieval. You don’t have to be a designer but you do need to demonstrate that you’ve built some beautiful things. We want a full partner in refining the interface to be the best ever built in this industry. You don’t have to be a backend developer but you do have to code your own APIs using node.js. You know how to cloud deploy a web app.

The ideal candidate will check the following boxes (with a number 2 pencil):
  • 5+ years with late model Javascript - the kind that compiles
  • 2+ years of React experience
  • Working knowledge of d3.js or another javascript visualization library
  • Typescript experience
  • A portfolio of visualization work you can share
  • 2+ years of node.js experience
  • A Computer Science degree is a plus but is not required
  • Working knowledge of deploying web applications on AWS or GCP

Note the use of the word “ideal candidate” and the reality that not everything in life is ideal but that’s alright if you get the shit done. In other words… don’t hesitate to apply if you don’t check every box.

Data Scientist

Catalogue, collect, ingest, profile, clean, integrate, analyze

Deep Discovery is hiring a Data Scientist to work on finding, collecting, ingesting, describing, cleaning and analyzing new datasets to incorporate into the system behind background checks for banks that our product produces. This is called a Know Your Customer (KYC) system and banks use them to evaluate the risk of doing business with their clients so they don’t face stiff fines from global regulatory agencies as much as $14 billion each year.

We are taking a network-centric approach to KYC that evaluates clients in terms of the context in which they do business. In order to build, annotate and extend a large business graph of the world’s light and dark economy, we need to ingest an increasing number of datasets to provide our risk scoring system with the features it needs to determine risk.

As a data scientist on our team, your job will be to:
  • Learn the world of FinTech to develop the intuition to build out and manage a “dataset universe” catalog of available data sources
  • Create a distributed data collection platform that can efficiently execute and update data crawls
  • Work ahead of the team to collect a “hit list” of datasets we anticipate needing
  • Respond to ad hoc requests for dataset collection, determining a method of collection, rate limits, proxy requirements and budgets
  • Work with datasets at scale using Python and Apache Spark / PySpark
  • Automatically profile datasets to create reports on their health and level of work required to employ
  • Clean datasets using Python and tools like Trifacta to make them usable
  • Join and integrate new datasets with existing ones and annotate our business graph
  • Grow into more sophisticated duties and then train your replacement!

The ideal candidate has a CS, Math, Statistics or Physics degree and at least 3+ years of Python experience in working with data, has an active GitHub portfolio of original projects, has research or early stage startup experience and is excited by the opportunity to feed and nourish a mission critical application for the finance industry. Big data experience with Spark, Hadoop, Dask or another system a plus.

Investigative Data Journalist

Investigate, analyze, report, test, guide, catch money launderers

Deep Discovery is hiring a Investigative Data Journalist to use its Anti-Money Laundering (AML) tools to investigate and catch money launderers and criminal networks. The role needs a senior journalist who has the background and investigative skills not just to provide expert feedback but to come up with new methods and investigative techniques for our engineering team to automate. These case studies will be used as training data for our algorithms that assign a risk score to banks’ customers. This is called a Know Your Customer (KYC) system and banks use them to evaluate the risk of doing business with their clients so they don’t face stiff fines from global regulatory agencies as much as $14 billion each year. In addition to helping banks, accredited investigative journalists will be given access to our tools, and you’ll be the first journalist to test and guide the product.

We are taking a network-centric approach to KYC that evaluates clients in terms of the context in which they do business. We are building an enormous business graph covering the world’s light and dark economies, and will mine this graph to reach a deeper understanding of financial risk. Your job will be to use your investigative skills and our tools to shine light on the actions of bad guys who’s illicit activities constitute as much as 5% of the world’s economy. This is a chance to own a significant stake in the company you work for while practicing your trade.

The ideal candidate for this role will have the following qualifications:
  • 5+ years of experience in investigative journalism or intelligence analysis
  • Track record of original reporting on financial crime
  • Track record of employing data collection and analysis in journalism
  • Data collection skills using code or an automated tool
  • Basic proficiency in data processing with Python, R or SQL
  • Data analysis and visualization skills
  • Journalism degree a plus but not required

Contact

For sales email 

sales@deepdiscovery.ai

For jobs email  

changeworld@deepdiscovery.ai

Please include a LinkedIn profile or resume, link to your Github portfolio and a note explaining why you want to work here.