初学Lucene刚接触搜索引擎知道了一点点想做个小工具实现根据单词搜索某个java源文件比如输入String去查询某些java源文件里用到了这个类
这个想法的来源是在以前刚学java时有一本java基础教程的书的附带光盘里有作者写的一个程序可以方便初学者查找某些类在哪个实例里出现当时没有太在意觉得作者的代码很长所以现在想自己也写一个这样的小程序
开发工具与运行环境使用Lucene的包jdk在WindowsXP下运行
思路分析与设计
整个程序里除了Lucene的必要操作外就是IO的基本操作了因为要对某目录下及其子目录下的所有Java源文件进行索引就要用到递归同时要过滤掉非Java源文件根据这种情况设计了以下个类
主类索引类(IndexJavaFiles)搜索类(SearchJavaFiles)
异常类索引异常类(IndexException)搜索异常类(SearchException)
还有一个文件过滤工厂类(FileFilterFactory)
异常类不是必要的特意设计来包装IO异常文件异常和Lucene的异常文件过滤工厂类的出现并不是故弄玄虚只是不想太多代码集中一起就把文件过虑器的设计放到一个类里下面是程序的完整代码及注释
IndexJavaFilesjava
/**
*indexthejavasourcefiles
*/
package powerwind;
import javaio*;
import javautilDate;
import orgapachelucenedocument*;
import orgapacheluceneindexIndexWriter;
/**
*@authorPowerwind
*@version
*/
publicclass IndexJavaFiles {
/**
*默认构造方法
*/
public IndexJavaFiles() {
}
/**
* 这个私有递归方法由index方法调用保证index传入的file是目录不是文件
*
*@paramwriter
*@paramfile
*@paramff
*@throwsIndexException
*/
privatevoid indexDirectory(IndexWriter writer File file FileFilter filter)throws IndexException {
if (fileisDirectory()) {
// 有选择地(过滤)获取目录下的文件和目录
File[] files = filelistFiles(filter);
// 非空目录
if (files != null) {
for (int i = ; i < fileslength; i++) {
indexDirectory(writer files[i] filter);
}
}
} else {
try {
// 这里的file经过先前的过滤
writeraddDocument(parseFile(file));
Systemoutprintln(增加文件 + file);
} catch (IOException ioe) {
thrownew IndexException(ioegetMessage());
}
}
}
/**
*传参数是文件就直接索引若是目录则交给indexDirectory递归
*
*@paramwriter
*@paramfile
*@paramff
*@throwsIndexException
*/
publicvoid index(IndexWriter writer File file FileFilter filter) throws IndexException {
// 确定可读
if (fileexists() && filecanRead()) {
if (fileisDirectory()) {
indexDirectory(writer file filter);
} elseif (filteraccept(file)) {
try {
writeraddDocument(parseFile(file));
Systemoutprintln(增加文件 + file);
} catch (IOException ioe) {
thrownew IndexException(ioegetMessage());
}
} else {
Systemoutprintln(指定文件或目录错误没有完成索引);
}
}
}
/**
*@paramfile
*
*把File变成Document
*/
private Document parseFile(File file) throws IndexException {
Document doc = new Document();
docadd(new Field(path filegetAbsolutePath() FieldStoreYES
FieldIndexUN_TOKENIZED));
try {
docadd(new Field(contents new FileReader(file)));
} catch (FileNotFoundException fnfe) {
thrownew IndexException(fnfegetMessage());
}
return doc;
}
}
index(IndexWriter writer File file FileFilter filter)调用私有方法indexDirectory(IndexWriter writer File file FileFilter filter)完成文件的索引
下面是IndexException异常类
IndexExceptionjava
package powerwind;
publicclass IndexException extends Exception {
public IndexException(String message) {
super(Throw IndexException while indexing files: + message);
}
}
下面是FileFilterFactory类返回一个特定的文件过滤器(FileFilter)
FileFilterFactoryjava
package powerwind;
import javaio*;
publicclass FileFilterFactory {
/**
*静态匿名内部类
*/
privatestatic FileFilter filter = new FileFilter() {
publicboolean accept(File file) {
long len;
return fileisDirectory()||
(filegetName()endsWith(java) &&
((len = filelength()) > ) && len < * );
}
};
publicstatic FileFilter getFilter() {
returnfilter;
}
}
main方法
/**
* main方法
*/
publicstaticvoid main(String[] args) throws Exception {
IndexJavaFiles ijf = new IndexJavaFiles();
Date start = new Date();
try {
IndexWriter writer = IndexWriterFactorynewInstance()createWriter(/index true);
Systemoutprintln(Indexing );
ijfindex(writer new File() FileFilterFactorygetFilter());
Systemoutprintln(Optimizing);
writeroptimize();
writerclose();
Date end = new Date();
Systemoutprintln(endgetTime() startgetTime() + total milliseconds);
} catch (IOException e) {
Systemoutprintln( caught a + egetClass() + \n with message: + egetMessage());
}
}
SearchJavaFilesjava
package powerwind;
import javaio*;
import orgapacheluceneanalysisAnalyzer;
import orgapacheluceneanalysisstandardStandardAnalyzer;
import orgapachelucenedocumentDocument;
import orgapacheluceneindexIndexReader;
import orgapachelucenequeryParser*;
import orgapachelucenesearch*;
publicclass SearchJavaFiles {
private IndexSearcher searcher;
private QueryParser parser;
/**
*
*@paramsearcher
*/
public SearchJavaFiles(IndexSearcher searcher) {
thissearcher = searcher;
}
/**
*
*@paramfield
*@paramanalyzer
*/
publicvoid setParser(String field Analyzer analyzer) {
setParser(new QueryParser(field analyzer));
}
/**
*@paramparser
*/
publicvoid setParser(QueryParser parser) {
thisparser = parser;
}
/**
*
*@paramquery
*@returnHits
*@throwsSearchException
*/
public Hits serach(Query query) throws SearchException {
try {
returnsearchersearch(query);
} catch (IOException ioe) {
thrownew SearchException(ioegetMessage());
}
}
/**
*
*@paramqueryString
*@returnHits
*@throwsSearchException
*/
public Hits serach(String queryString) throws SearchException {
if (parser == null)
thrownew SearchException(parser is null!);
try {
returnsearchersearch(parserparse(queryString));
} catch (IOException ioe) {
thrownew SearchException(ioegetMessage());
} catch (ParseException pe) {
thrownew SearchException(pegetMessage());
}
}
/**
*
*输出hits的结果从start开始到end不包括end
*
*@paramhits
*@paramstart
*@paramend
*@throwsSearchException
*/
publicstatic Hits display(Hits hits int start int end) throws SearchException {
try {
while (start < end) {
Document doc = hitsdoc(start);
String path = docget(path);
if (path != null) {
Systemoutprintln((start + ) + + path);
} else {
Systemoutprintln((start + ) + + No such path);
}
start++;
}
} catch (IOException ioe) {
thrownew SearchException(ioegetMessage());
}
return hits;
}
main方法
/**
*@paramargs
*/
publicstaticvoid main(String[] args) throws Exception {
String field = contents;
String index = /index;
finalint rows_per_page = ;
finalchar NO = n;
SearchJavaFiles sjf = new SearchJavaFiles(new IndexSearcher(IndexReaderopen(index)));
sjfsetParser(field new StandardAnalyzer());
BufferedReader in = new BufferedReader(new InputStreamReader(Systemin UTF));
while (true) {
Systemoutprintln(Query: );
String line = inreadLine();
if (line == null || linelength() < ) {
Systemoutprintln(eixt query);
break;
}
Hits hits = sjfserach(line);
Systemoutprintln(searching for + line + Result is );
int len = hitslength();
int i = ;
if (len > )
while (true) {
if (i + rows_per_page >= len) {
SearchJavaFilesdisplay(hits i len);
break;
} else {
SearchJavaFilesdisplay(hits i i += rows_per_page);
Systemoutprintln(more y/n?);
line = inreadLine();
if (linelength() < || linecharAt() == NO)
break;
}
}
else
Systemoutprintln(not found);
}
}
}
SearchExceptionjava
package powerwind;
publicclass SearchException extends Exception {
public SearchException(String message) {
super(Throw SearchException while searching files: + message);
}
}
完善设想
文件格式
能够处理Zip文件Jar文件索引里面的java源文件
通过反射机制索引class类文件
输入输出
除控制台输入输出外还可以选择从文件读取查询关键字输出查询结果到文件
用户界面
图形界面操作双击查询结果的某条记录可以打开相应文件
性能方面
索引文件时用缓存和多线程处理