Publications by authors named "Kouichi Kimura"

The Burrows-Wheeler transform (BWT) of short-read data has unexplored potential utilities, such as for efficient and sensitive variation analysis against multiple reference genome sequences, because it does not depend on any particular reference genome sequence, unlike conventional mapping-based methods. However, since the amount of read data is generally much larger than the size of the reference sequence, computation of the BWT of reads is not easy, and this hampers development of potential applications. For the alleviation of this problem, a new method of computing the BWT of reads in parallel is proposed.

View Article and Find Full Text PDF

Aim: The present study investigated the effect of sarcopenia on short- and long-term surgical outcomes and identified potential prognostic factors for hepatocellular carcinoma (HCC) following hepatectomy among patients 70 years of age and older.

Methods: Patient data were retrospectively collected for 296 consecutive patients who underwent hepatectomy for HCC with curative intent. Patients were assigned to two groups according to age (younger than 70 years, and 70 years and older), and the presence of sarcopenia.

View Article and Find Full Text PDF

Background: The potential utility of the Burrows-Wheeler transform (BWT) of a large amount of short-read data ("reads") has not been fully studied. The BWT basically serves as a lossless dictionary of reads, unlike the heuristic and lossy reads-to-genome mapping results conventionally obtained in the first step of sequence analysis. Thus, it is naturally expected to lead to development of sensitive methods for analysis of short-read data.

View Article and Find Full Text PDF

Motivation: Sequence-variation analysis is conventionally performed on mapping results that are highly redundant and occasionally contain undesirable heuristic biases. A straightforward approach to single-nucleotide polymorphism (SNP) analysis, using the Burrows-Wheeler transform (BWT) of short-read data, is proposed.

Results: The BWT makes it possible to simultaneously process collections of read fragments of the same sequences; accordingly, SNPs were found from the BWT much faster than from the mapping results.

View Article and Find Full Text PDF

Myers' elegant and powerful bit-parallel dynamic programming algorithm for approximate string matching has a restriction that the query length should be within the word size of the computer, typically 64. We propose a modification of Myers' algorithm, in which the modification has a restriction not on the query length but on the maximum number of mismatches (substitutions, insertions, or deletions), which should be less than half of the word size. The time complexity is O(m log |Σ|), where m is the query length and |Σ| is the size of the alphabet Σ.

View Article and Find Full Text PDF

We introduce a new data structure, a localized suffix array, based on which occurrence information is dynamically represented as the combination of global positional information and local lexicographic order information in text search applications. For the search of a pair of words within a given distance, many candidate positions that share a coarse-grained global position can be compactly represented in term of local lexicographic orders as in the conventional suffix array, and they can be simultaneously examined for violation of the distance constraint at the coarse-grained resolution. Trade-off between the positional and lexicographical information is progressively shifted towards finer positional resolution, and the distance constraint is reexamined accordingly.

View Article and Find Full Text PDF

We analyzed diversity of mRNA produced as a result of alternative splicing in order to evaluate gene function. First, we predicted the number of human genes transcribed into protein-coding mRNAs by using the sequence information of full-length cDNAs and 5'-ESTs and obtained 23 241 of such human genes. Next, using these genes, we analyzed the mRNA diversity and consequently sequenced and identified 11 769 human full-length cDNAs whose predicted open reading frames were different from other known full-length cDNAs.

View Article and Find Full Text PDF

Abstract We have developed efficient in-practice algorithms for computing rank and select functions on a binary string, based on a novel data structure, a hierarchical binary string with hierarchical accumulatives. It efficiently stores decomposed information on partial summations over various scales of subregions of a given binary string, so that the required space overhead ratio is only about 3.5% irrespective of the string length.

View Article and Find Full Text PDF

Background: While recent advances in asthma management have enabled adequate control to be frequently achieved in outpatient settings, children whose asthma remains poorly controlled despite outpatient treatment are often referred to extended-stay hospitals. The aim of the present study was to examine trends concerning extended-stay hospitalization and to evaluate the present status of this approach.

Methods: A retrospective study was conducted to assess changes in the number of admissions among 408 children with extended stays at Kamiamakusa General Hospital between 1989 and 2005.

View Article and Find Full Text PDF

Completion of human genome sequencing has greatly accelerated functional genomic research. Full-length cDNA clones are essential experimental tools for functional analysis of human genes. In one of the projects of the New Energy and Industrial Technology Development Organization (NEDO) in Japan, the full-length human cDNA sequencing project (FLJ project), nucleotide sequences of approximately 30 000 human cDNA clones have been analyzed.

View Article and Find Full Text PDF

Appropriate resources and expression technology necessary for human proteomics on a whole-proteome scale are being developed. We prepared a foundation for simple and efficient production of human proteins using the versatile Gateway vector system. We generated 33,275 human Gateway entry clones for protein synthesis, developed mRNA expression protocols for them and improved the wheat germ cell-free protein synthesis system.

View Article and Find Full Text PDF

Here we report the new features and improvements in our latest release of the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/), a comprehensive annotation resource for human genes and transcripts.

View Article and Find Full Text PDF

By analyzing 1,780,295 5'-end sequences of human full-length cDNAs derived from 164 kinds of oligo-cap cDNA libraries, we identified 269,774 independent positions of transcriptional start sites (TSSs) for 14,628 human RefSeq genes. These TSSs were clustered into 30,964 clusters that were separated from each other by more than 500 bp and thus are very likely to constitute mutually distinct alternative promoters. To our surprise, at least 7674 (52%) human RefSeq genes were subject to regulation by putative alternative promoters (PAPs).

View Article and Find Full Text PDF
Article Synopsis
  • * We used techniques like oligo-capping, translation start point prediction through ATGpr, and specific searches in the SWISS-PROT database to filter and select the cDNAs, ultimately identifying 789 potential candidates.
  • * Out of the selected candidates, 334 were identified as novel cDNAs, with 88.3% forecasted to code for secretion or membrane proteins, including key elements like transporters and receptors that play crucial roles in cellular functions.
View Article and Find Full Text PDF

Occult bacteremia with Streptococcus pneumoniae (S. pneumoniae) is sometimes experienced in general clinics, while that with Haemophilus influenzae type b (Hib) is less common and mostly develops to serious central nervous infection. Recently we encountered a patient with bacteremia due to Hib, in whom bacteremia recovered spontaneously without intravenous antibiotic therapy.

View Article and Find Full Text PDF
Article Synopsis
  • The human genome contains significant biological potential, but understanding its full functionality is challenging due to limited knowledge of gene functions and variability in gene transcripts.
  • Researchers have characterized over 41,000 full-length cDNAs to enhance the understanding of gene structure and function, validating over 21,000 gene candidates and identifying more than 5,000 new ones.
  • The resulting human gene database (H-InvDB) offers extensive information about genes, including structures, alternative splicing, non-coding RNAs, and genetic variations, while also revealing potential inaccuracies in the existing human genome sequence.
View Article and Find Full Text PDF

As a base for human transcriptome and functional genomics, we created the "full-length long Japan" (FLJ) collection of sequenced human cDNAs. We determined the entire sequence of 21,243 selected clones and found that 14,490 cDNAs (10,897 clusters) were unique to the FLJ collection. About half of them (5,416) seemed to be protein-coding.

View Article and Find Full Text PDF

Shiga toxin 2 (Stx2) variants have been found to exhibit not only antigenic divergence, but also differences in toxicity for tissue culture cells and animals. To clarify whether all or just a subset of Stx2 variants are important for the virulence of Shiga toxin-producing Escherichia coli, we designed PCR primers to detect and type all reported variants. We classified them into four groups according to the nucleotide sequences of the Stx2 family; for example, group 1 (G1) contains VT2vha and group 2 (G2) contains VT2d-Ount.

View Article and Find Full Text PDF