“Trust me, I know what I’m doing!” – Court Outlines Perils of Custodian Self-Collection and Inadequate Keyword Searches
In a recent ruling, United States Southern District Judge and e-discovery authority Shira Scheindlin, of Zubulake and Pension Committee fame, held that various government agencies had failed to adequately design searches for responsive electronically-stored information. While the case, National Day Laborer Org. Network et al. v. U.S. Immigration and Customs Enforcement Agency, et al., 2012 U.S. Dist. LEXIS 97863 (S.D.N.Y. July 13, 2012), deals largely with searches in the context of the Freedom of Information Act (“FOIA”), Judge Scheindlin noted “much of the logic behind . . . e-discovery searches is instructive in the FOIA search context because it educates litigants and the courts about the types of searches that are or are not likely to uncover all responsive documents.”
Plaintiffs sought records from five government agencies concerning the U.S. Immigration and Customs Enforcement Agency’s (“ICE”) “Secure Communities” immigration enforcement program’s “opt-out” provision. In cross-moving for summary judgment, each of the defendant government agencies filed declarations attesting to the sufficiency and level of detail associated with the requested search for records. The Court, however, ordered that additional searches be conducted by most of the defendant agencies because, among other deficiencies, they failed to follow through on obvious leads, search archived records, and adequately describe the extent of their searches; and in one striking example, one defendant “absurd[ly]” interpreted a custodian’s failure to respond to a request for records as proof that no responsive documents existed.
This decision underscores the dangers of custodian self-collection in the context of e-discovery. In particular, the Court emphasized two reasons why it could not “simply trust” assertions by defendants that their custodians “have designed and conducted a reasonable search” that could be “reasonably calculated to uncover all relevant documents.” First, because many of the defendants’ affidavits did not “record and report the search terms that they used, how they combined them, and whether they searched the full text of documents,” the affidavits lacked a “reasonable specificity of detail” and thus failed to establish that an adequate search was conducted. Second, while most custodians are familiar with “[s]earching for an answer on Google (or Westlaw or Lexis),” the Court concluded that defendants’ custodians lacked the skills necessary to “design legally sufficient electronic searches in the discovery or FOIA contexts” because it was “not part of their daily responsibilities.”
Moreover, the Court stressed the inadequacies associated with keyword searches, opining that “[e]ven in the simplest case . . . there is no guarantee that using keywords will always prove sufficient.” Other problems associated with the keyword searches involved the lack of supervision over custodial keyword searches, often performed by laypersons. Because of these limitations, Judge Scheindlin cautioned that courts and parties must move “beyond the use of keyword searches” and “rely on latent semantic indexing, statistical probability models, and . . . iterative learning,” which is generally known as predictive coding, in order to “significantly increase the effectiveness and efficiency of searches.”
Although the case largely deals with searches in the FOIA context, Judge Scheindlin has sent yet another characteristically direct warning that all parties in discovery must realize the limitations of untested keyword searches, guard against inadequate supervision of custodial searches, and learn to use twenty-first century technologies to perform adequate e-discovery searches.