[This is a guest post by Rudraksh Lakra and Medha Kolanu.]
In June 2026, two developments in quick succession illustrated the expanding reach of state surveillance. On 19 June, Home Minister Amit Shah launched Abhigyan, a mobile application developed by the National Crime Records Bureau (“NCRB”) that enables police officers to scan fingerprints in the field using a smartphone and receive a criminal history match from a national database within 35 seconds. Days later, reports emerged that the Central Industrial Security Force (“CISF”) was proposing to deploy facial recognition cameras across six major Indian airports.
The database underpinning Abhigyan is the National Automated Fingerprint Identification System (“NAFIS”), which contains over 1.3 crore fingerprint records. NAFIS is integrated with the Crime and Criminal Tracking Network and Systems (“CCTNS”), the nationwide policing platform containing FIRs, investigation reports, charge sheets, criminal histories, arrest records, and related administrative information. The proposed airport surveillance network, meanwhile, would feed into National Intelligence Grid (“NATGRID”), an intelligence platform linking at least twenty-one categories of government databases. In both instances, surveillance operates through large, centralised databases that aggregate and correlate information across multiple systems.
The accumulation of vast data repositories has become an inevitable feature of both public administration and private enterprise. Since the advent of the internet, increasingly detailed digital footprints have been generated through smartphones, laptops, wearable devices, Internet of Things ecosystems, e-governance initiatives, and digital public infrastructure. The objective is increasingly to map and classify every aspect of individual and social life, from movement and transactions to communications and patterns of association (see here, and here). Surveillance now operates through an intertwined relationship between the state and private actors, with private companies often possessing more detailed information about individuals than the state itself (ibid, and see here).
In India, surveillance practices operate across a spectrum ranging from legal “black holes” to “grey zones”. Black holes are surveillance regimes that function without a clear statutory basis, notwithstanding their significant operational reach. Programmes such as NATGRID, the CCTNS, and the NAFIS fall within the black hole, and many of these evolved in the aftermath of the 26/11 Mumbai attacks, exemplifying this category. Grey zones refer to situations of “lawful illegality”, where formal legal frameworks exist but fail to adhere to a meaningful conception of the rule of law. The targeted surveillance falls within this category.
These surveillance regimes have attracted sustained criticism (see here, here, here and here). They frequently lack meaningful and independent oversight, both at the stage of authorisation and after surveillance has been conducted (ibid). Law enforcement agencies enjoy broad discretionary powers, while the governing legal frameworks provide only limited safeguards relating to data protection, transparency, accountability, and effective remedies (ibid). These features are part of an architecture of authoritarianism and suggest a continuing colonial logic of governance, in which expansive surveillance powers and limited accountability mechanisms privilege state authority (see here, here, and here). Historically, the Indian Supreme Court has also permitted the admission of illegally obtained evidence. Consequently, even where surveillance authorisation procedures were violated, individuals often had little practical recourse against the state. Following the Supreme Court’s decision in Justice K. S. Puttaswamy v. Union of India (2017), which recognised privacy as a fundamental right, a stronger case exists for adopting the “fruit of the poisonous tree” doctrine, under which evidence derived from unconstitutional surveillance would likewise be excluded (see here, and here). Although some High Courts have shown openness to this approach, the broader judicial position remains unsettled.
Although privacy scholarship has extensively examined surveillance authorisation and procedural safeguards, one issue remains under-theorised: how should constitutional privacy law regulate databases themselves, particularly large-scale, interconnected ones? The question arises both when the state directly searches a database and when it compels a private entity to search one on its behalf. As investigations increasingly depend on querying vast datasets and correlating information across multiple systems, privacy doctrine must account for the nature of the data collected, the scale of the repository, the design of search queries, and the inferences they generate. This blog introduces a constitutional framework for analysing database searches. It begins with the leading account proposed by Orin Kerr, identifies its limitations, and advances an alternative query-based model for understanding privacy harms in the age of large-scale databases. The query-based model is developed in greater detail in a forthcoming paper.
Filter-focused model
A recent attempt to address this issue is found in the work of Orin Kerr, who proposes a filter-focused model for analysing searches across digital databases. Kerr developed this framework in response to emerging creative investigative techniques. These techniques challenge the conventional Fourth Amendment doctrine, which protects “persons, houses, papers, and effects” against unreasonable searches and seizures. For instance, in People v. Seymour, investigators asked Google to identify users who had searched for a specific address associated with a criminal investigation by querying its entire Search and Maps records database (a reverse search warrant). A related technique is the geofence warrant, through which police request information about devices located within a geographic area during a specified time period. Another method is the tower dump, by which authorities obtain records of all devices connected to a particular cellular tower within a specified timeframe. In each case, the traditional warrant model is reversed. Instead of identifying a specific person or place in advance, investigators search the entire database and then narrow the results.
Kerr notes that existing Fourth Amendment principles provide limited guidance for determining how constitutional protections should apply to these forms of database searching. Analogies drawn from physical searches are of limited assistance in digital environments. In a physical search, officers move objects to bring items into view, exposing them to human observation. Digital investigations instead involve computers scanning vast quantities of stored data. The key constitutional question, therefore, becomes whether the search occurs at the moment of scanning, at the point of a positive match, or only when the results are presented to investigators.
Kerr argues for the adoption of a filter-focused model. Rather than emphasising the size of the database, this approach focuses on the design of the filter applied during the search. A digital filter runs through a dataset and produces an output visible to investigators. Constitutional analysis, thus, focuses on what information the output reveals and what inferences may be drawn from the filter used to generate it. The constitutional question remains: “What information becomes exposed to human observation through that filter?” This reasoning subsequently appeared in Seymour, where the Court upheld a reverse search warrant directed at Google. Although the warrant required scanning a database containing records of numerous users, the Court stressed that the search was tightly circumscribed. It was limited to queries for a specific address associated with the crime scene within a narrow time frame. The initial results were anonymised, and only a small number of records were ultimately disclosed. The legality of the search turned not on the size of the database but on the precision of the search parameters.
Query-Based Model
Kerr’s model provides a useful framework for analysing digital surveillance by focusing on what can be inferred from the output of a database search in light of the filter applied. However, the model remains incomplete in two important respects. First, Kerr argues that neither the mere collection of data nor the act of searching raises a privacy concern; rather, the concern arises only at the stage of disclosure. In this view, the size, nature, and granularity of the database matter only to the extent that they shape the filter’s output. The database itself is treated as constitutionally inert until the results are revealed. This approach understates the independent constitutional significance of the filter. The design and configuration of the search filter can implicate privacy interests even before results are produced. If a database contains information protected by the Fourth Amendment, querying that database constitutes a search that must itself satisfy constitutional requirements. Constitutional scrutiny cannot be limited to the final output; it must also extend to the scope and structure of the authorised query.
Consider United States v. Smith (2024), where the Fifth Circuit held:
When law enforcement submits a geofence warrant to Google, Step 1 forces the company to search through its entire database to provide a new dataset that is derived from its entire Sensorvault. In other words, law enforcement cannot obtain its requested location data unless Google searches through the entirety of its Sensorvault—all 592 million individual accounts—for all of their locations at a given point in time. Moreover, this search is occurring while law enforcement officials have no idea who they are looking for, or whether the search will even turn up a result. Indeed, the quintessential problem with these warrants is that they never include a specific user to be identified, only a temporal and geographic location where any given user may turn up post-search.11 That is constitutionally insufficient. (Page 32)
Kerr criticises this reasoning to the extent that it suggests any search across a massive database is inherently suspect. On this point, his critique is persuasive. The mere fact that a search operates across a large dataset does not make it unconstitutional. In digital systems, large-scale queries are often technologically unavoidable. However, Kerr does not fully engage with the more significant aspect of the Court’s reasoning: the absence of particularity. The problem was not simply that Google had to query a large database, but that the warrant failed to specify a particular person or place. Instead, it authorised a search defined only by time and geography, allowing investigators to identify whoever happened to be present in the area.
This reasoning highlights a central constitutional principle. Both the search filter and the resulting output must satisfy the requirement of specificity. A filter that lacks a clear nexus to a particular suspect or defined place, or that is not confined within reasonable temporal limits, risks functioning as a general warrant. Even if later stages narrow the output, the initial authorisation may already have permitted a broad and open-ended search inconsistent with constitutional safeguards. The constitutional problem with geofence warrants, tower dumps, and similar dragnet searches, therefore, does not lie simply in querying a large database. Rather, it lies in the absence of a narrowly defined filter linked to a particular person or place, confined within a reasonable time period, and grounded in a specific investigative objective.
Second, in developing the filter-focused model, Kerr largely conceptualises database searches as mechanistic queries. Under this model, a computer scans a dataset for a precisely defined input, such as a specific word, number, or combination of characters. The system converts the query into machine-readable form, for example, ASCII, checks each record for an exact match, and returns only those entries satisfying the criteria. For instance, law enforcement may query call detail records using a specific phone number to retrieve all calls made to or from that number within a defined period. The phone number functions as the filter, and the system returns only the matching records. The same logic applies to searching an email database for a specific keyword or querying a government database using a unique identification number.
Kerr’s model is built around this understanding of search as an exact-match, filter-based operation. However, restricting the concept of search to exact-match queries is increasingly difficult in contemporary digital systems. Many modern searches rely on probabilistic or similarity-based techniques that lie between strict matching and full algorithmic inference. For example, law enforcement systems searching for child sexual abuse material often use hash databases, where files are compared against known digital fingerprints rather than exact file names or identifiers. Similarly, biometric identification systems rely on facial recognition algorithms that perform one-to-many (1) or many-to-many (N) matching across large databases (see here, and here).
For constitutional analysis, the concept of search should therefore be understood more broadly as a query directed at a dataset. What matters is that the state directs a computational query at a database to extract information about individuals, patterns, or relationships. Whether the method relies on exact matching, probabilistic similarity scoring, or machine learning or AI does not alter the underlying fact that the state is interrogating the dataset. This broader framework may be described as a query-based approach.
This understanding is particularly important where algorithmic systems generate profiles or predictive assessments. Governments increasingly deploy such tools to analyse large administrative databases in areas such as welfare administration, fraud detection, and risk scoring. These systems analyse behavioural patterns, transaction histories, and relational networks to identify individuals who satisfy defined risk profiles. Even where the system is not searching for a single keyword or identifier, it nevertheless conducts a structured interrogation of the dataset to identify individuals meeting analytical criteria. Functionally, this remains a form of database search.
Accordingly, under a query-based approach, the concept of search should encompass both deterministic and probabilistic database queries, including AI-driven analysis. The constitutional concern does not depend on whether the query relies on exact matching or probabilistic inference. Instead, it turns on the scope of the computational query, the nature of the database, the information exposed to scrutiny, and the inferences the state is able to draw from the resulting output.
Abhigyan and NAFIS
NAFIS was first announced by the Ministry of Home Affairs in December 2024 as a consolidated national fingerprint repository, bringing together datasets that state police departments had previously maintained separately under the NCRB. By October 2024, the NCRB had already compiled over 1.06 crore fingerprint records. NAFIS also operates within India’s broader digital policing ecosystem through its integration with the CCTNS, the nationwide policing platform containing FIRs, investigation reports, charge sheets, criminal histories, arrest records, and related administrative information. A biometric match can therefore do far more than identify a fingerprint. It can retrieve criminal justice information associated with that individual from policing records across states, departments, agencies, and stations. A fingerprint search, in other words, was never a search for a fingerprint alone. It was always an entry point into a much larger informational environment capable of generating detailed investigative profiles through the linkage of multiple databases.
Abhigyan does not alter this underlying architecture, but alters the point of access. Where a fingerprint query once required some institutional process, a station visit, a recordkeeping step, where there may be dedicated scanning equipment, or a formal request to access the repository, Abhigyan compresses this into a smartphone application usable by any field officer during a routine encounter. A vehicle check, a street stop, or an interaction with someone an officer regards as suspicious can now generate a query against the full national repository, with a result reportedly returned in approximately 35 seconds.
A query-based analysis asks whether the search filter applied is narrowly tailored and tied to a specific investigative purpose. But, within NAFIS, every comparison effectively runs the recovered print against the entire repository, meaning individuals whose biometric data sits within the system remain continuously searchable across investigations occurring anywhere in the country. Because NAFIS is linked to CCTNS, a successful match may immediately expose not only biometric identity but also connected FIRs, investigation records, charge sheets, criminal histories, and other policing information. The informational consequences of the query, therefore, extend well beyond biometric identification. This converts the relationship between an individual and investigative agencies into one of suspicion. Scrutiny no longer follows from independent evidence connecting a person to an offence. Inclusion within the dataset becomes its own basis for repeated examination.
Abhigyan sharpens this concern by removing the procedural friction that once separated a routine encounter from a database query. The PIB description of the application’s use case, identifying “suspicious individuals” during “routine vehicle checks,” suggests that the query is authorised by an officer’s discretion at the point of contact, rather than by a particularised suspicion connected to a known offence. A search filter without a defined nexus to a specific person, place, or investigation functions, in substance, functions as an open-ended search, irrespective of how efficiently the result is generated. By enabling immediate access to a nationally networked biometric repository from a handheld device, Abhigyan substantially lowers the practical threshold for initiating searches against individuals encountered during ordinary policing activities.
The probabilistic nature of fingerprint matching adds a further layer of concern. Automated identification relies on similarity scoring rather than deterministic comparison, and its reliability may be further reduced where field-collected fingerprints are partial or degraded. Petitioners during the Aadhaar litigation argued that fingerprint authentication carries measurable error rates due to the quality of prints, the conditions of collection, and the limitations of matching technology, and that within a database of this scale, quality concerns become more pronounced and the probability of erroneous identification increases accordingly (see here, here, and here). A 35-second turnaround, while framed as a marker of efficiency, also compresses the time available to assess the reliability of an algorithmic output before it is acted upon in the field. Where the resulting match simultaneously unlocks access to extensive criminal justice records through CCTNS, the consequences of an erroneous identification become considerably more significant than a mistaken biometric comparison alone.
The government cites the Criminal Procedure (Identification) Act, 2022 as the statutory basis for the construction of this repository, but does little to discipline the subsequent act of querying it. Section 3 materially expanded the categories of persons from whom biometric measurements may be collected, extending beyond convicted persons to include arrestees, detainees, persons ordered to give security for maintaining public order or good behaviour, and others falling within its broad scope (also see here, here, and here). It authorises the collection of fingerprints and other biometric identifiers and permits their retention for periods extending up to seventy-five years (Section 4(2)). The combined effect is to substantially enlarge the pool of individuals whose biometric data becomes available for future investigative searches.
The Act also does not answer when, and on what threshold, those stored biometric records may subsequently be queried. Authorisation to collect and retain biometric data is effectively treated as sufficient authority to search that data whenever an officer chooses to initiate a scan. This conflates two distinct legal issues: the legality of constructing the database and the legality of each subsequent search conducted against it. The former does not automatically justify the latter. A statutory power to retain fingerprints does not answer the separate question of whether a field officer may query a national repository during a routine stop, absent any demonstrable investigative nexus.
Thus, an individual arrested but never convicted may remain within NAFIS for decades, with no clear statutory mechanism ensuring deletion where continued retention becomes difficult to justify. Through its integration with CCTNS, that individual’s biometric record remains continuously capable of revealing associated policing information whenever a search is initiated. Abhigyan ensures that this person can now be identified by any authorised officer, in any location, through an instantaneous field query conducted on the strength of a database entry that may never have resulted in a finding of guilt. Viewed through the query-based lens, the constitutional difficulty therefore lies not simply in the existence of a fingerprint database, but in the combination of an expansive collection regime, prolonged retention, extensive cross-database integration, probabilistic matching, and a query mechanism that permits repeated searches without requiring a clearly defined investigative justification before they are initiated.
Airport Facial Recognition and NATGRID
NATGRID was established in 2010 in response to the intelligence coordination failures exposed by the 2008 Mumbai attacks. It does not operate as a conventional database. It functions as middleware, a query infrastructure connecting at least twenty-one categories of datasets maintained across separate government and private entities, spanning banking and financial transactions, telecommunications subscriber data, passport and immigration records, tax filings, travel histories, and vehicle registration information. In December 2025, this architecture was extended further through integration with the National Population Register, which holds family-wise demographic data for 119 crore residents.
The defining feature of this system, from a query-based standpoint, is that a single identifier can trigger simultaneous searches across every connected dataset. A phone number or passport number entered as a query parameter does not return information from one database. It returns whatever can be correlated across financial records, communications data, travel patterns, and administrative identifiers at once. The effective scope of a query, then, is not determined by how narrowly an investigator frames the search. It is determined by the breadth of what the platform has been built to connect.
This is the central difficulty with the CISF proposal to integrate facial recognition at six major airports into a national data fusion centre cross-referenced against NATGRID. A traveller’s face, captured at an airport gate, becomes a query parameter capable of activating searches across systems that bear no apparent relationship to airport security. The query-based framework requires that a search filter be tied to a specific investigative objective. It becomes difficult to identify what that objective is once a facial match at a checkpoint is permitted to cascade into an inquiry across multiple connected databases, none of which were within contemplation when the camera captured the image.
The structural design of the proposed deployment compounds this concern. CISF officials have described cameras positioned at entry and exit points across airports, intended to flag fugitives and persons of interest as travellers pass through. This is not a targeted search authorised against a known suspect. It is a continuous screening of the travelling public, with each face automatically compared against stored templates. Within a query-based analysis, this constitutes population-level interrogation rather than investigation, a distinction that carries real constitutional weight. A query directed at a specific, named individual suspected of a specific offence operates within the bounds that particularity requires. A query that screens an entire travelling population, on the possibility that some unspecified match may surface, does not.
NATGRID’s own architecture deepens the inferential stakes of this integration. The platform incorporates tools such as Gandiva, which enable entity resolution and relational analysis across connected datasets, generating behavioural and associational profiles that extend well beyond what any single database could reveal on its own. A facial match at an airport gate, layered onto this infrastructure, becomes a potential entry point into a far larger correlation exercise, capable of reconstructing a traveller’s movement history, financial conduct, and family relationships from datasets that had no connection to the original purpose for which the airport camera was installed.
Officials have indicated that privacy and data protection will remain “a priority,” yet the public record offers no detail on the legal checks, technical limits, or independent oversight that would give that assurance. Under the query-based model developed here, that absence is itself the constitutional defect. The legitimacy of the system cannot rest on the eventual reliability of a match; it depends on whether the query authorising the search is confined by a defined subject, a defined purpose, and a defined boundary on the datasets it may reach. As proposed, the CISF system supplies none of these constraints, and the gap between aspiration and architecture is where the privacy concern actually resides.
Conclusion
The aim of this blog was to introduce and outline the contours of a query-based approach by applying it to Abhigyan and NAFIS, and to airport facial recognition integrated with NATGRID. While there may be reasonable disagreement about some of its conclusions, the principal value of the model lies not in prescribing fixed outcomes but in providing a structured framework for analysing how database surveillance operates. It serves as a methodological tool that enables courts, regulators, and policymakers to identify the constitutional questions that arise when the state interrogates large datasets. By examining the design of a query, the nature of the dataset, and the inferential capacity of analytical tools, the framework connects technical architecture with constitutional doctrine.
Its relevance extends beyond criminal investigations and surveillance to any context in which databases are queried, including digital public infrastructure, welfare administration, financial monitoring, and other forms of digital governance. In the forthcoming paper, we develop the framework in greater detail, also apply it to targeted surveillance, and propose a structured set of questions to constitutionally operationalise the query-based approach. The paper also advances high-level safeguards for database surveillance at three levels: the legal authorisation of database queries, the technical systems through which queries are executed, and the design of the databases that enable them.

