Search Results

Search found 128 results on 6 pages for 'yan'.

Page 6/6 | < Previous Page | 2 3 4 5 6 

  • CLSF & CLK 2013 Trip Report by Jeff Liu

    - by jamesmorris
    This is a contributed post from Jeff Liu, lead XFS developer for the Oracle mainline Linux kernel team. Recently, I attended both the China Linux Storage and Filesystem workshop (CLSF), and the China Linux Kernel conference (CLK), which were held in Shanghai. Here are the highlights for both events. CLSF - 17th October XFS update (led by Jeff Liu) XFS keeps rapid progress with a lot of changes, especially focused on the infrastructure/performance improvements as well as  new feature development.  This can be reflected with a sample statistics among XFS/Ext4+JBD2/Btrfs via: # git diff --stat --minimal -C -M v3.7..v3.12-rc4 -- fs/xfs|fs/ext4+fs/jbd2|fs/btrfs XFS: 141 files changed, 27598 insertions(+), 19113 deletions(-) Ext4+JBD2: 39 files changed, 10487 insertions(+), 5454 deletions(-) Btrfs: 70 files changed, 19875 insertions(+), 8130 deletions(-) What made up those changes in XFS? Self-describing metadata(CRC32c). This is a new feature and it contributed about 70% code changes, it can be enabled via `mkfs.xfs -m crc=1 /dev/xxx` for v5 superblock. Transaction log space reservation improvements. With this change, we can calculate the log space reservation at mount time rather than runtime to reduce the the CPU overhead. User namespace support. So both XFS and USERNS can be enabled on kernel configuration begin from Linux 3.10. Thanks Dwight Engen's efforts for this thing. Split project/group quota inodes. Originally, project quota can not be enabled with group quota at the same time because they were share the same quota file inode, now it works but only for v5 super block. i.e, CRC enabled. CONFIG_XFS_WARN, an new lightweight runtime debugger which can be deployed in production environment. Readahead log object recovery, this change can speed up the log replay progress significantly. Speculative preallocation inode tracking, clearing and throttling. The main purpose is to deal with inodes with post-EOF space due to speculative preallocation, support improved quota management to free up a significant amount of unwritten space when at or near EDQUOT. It support backgroup scanning which occurs on a longish interval(5 mins by default, tunable), and on-demand scanning/trimming via ioctl(2). Bitter arguments ensued from this session, especially for the comparison between Ext4 and Btrfs in different areas, I have to spent a whole morning of the 1st day answering those questions. We basically agreed on XFS is the best choice in Linux nowadays because: Stable, XFS has a good record in stability in the past 10 years. Fengguang Wu who lead the 0-day kernel test project also said that he has observed less error than other filesystems in the past 1+ years, I own it to the XFS upstream code reviewer, they always performing serious code review as well as testing. Good performance for large/small files, XFS does not works very well for small files has already been an old story for years. Best choice (maybe) for distributed PB filesystems. e.g, Ceph recommends delopy OSD daemon on XFS because Ext4 has limited xattr size. Best choice for large storage (>16TB). Ext4 does not support a single file more than around 15.95TB. Scalability, any objection to XFS is best in this point? :) XFS is better to deal with transaction concurrency than Ext4, why? The maximum size of the log in XFS is 2038MB compare to 128MB in Ext4. Misc. Ext4 is widely used and it has been proved fast/stable in various loads and scenarios, XFS just need more customers, and Btrfs is still on the road to be a manhood. Ceph Introduction (Led by Li Wang) This a hot topic.  Li gave us a nice introduction about the design as well as their current works. Actually, Ceph client has been included in Linux kernel since 2.6.34 and supported by Openstack since Folsom but it seems that it has not yet been widely deployment in production environment. Their major work is focus on the inline data support to separate the metadata and data storage, reduce the file access time, i.e, a file access need communication twice, fetch the metadata from MDS and then get data from OSD, and also, the small file access is limited by the network latency. The solution is, for the small files they would like to store the data at metadata so that when accessing a small file, the metadata server can push both metadata and data to the client at the same time. In this way, they can reduce the overhead of calculating the data offset and save the communication to OSD. For this feature, they have only run some small scale testing but really saw noticeable improvements. Test environment: Intel 2 CPU 12 Core, 64GB RAM, Ubuntu 12.04, Ceph 0.56.6 with 200GB SATA disk, 15 OSD, 1 MDS, 1 MON. The sequence read performance for 1K size files improved about 50%. I have asked Li and Zheng Yan (the core developer of Ceph, who also worked on Btrfs) whether Ceph is really stable and can be deployed at production environment for large scale PB level storage, but they can not give a positive answer, looks Ceph even does not spread over Dreamhost (subject to confirmation). From Li, they only deployed Ceph for a small scale storage(32 nodes) although they'd like to try 6000 nodes in the future. Improve Linux swap for Flash storage (led by Shaohua Li) Because of high density, low power and low price, flash storage (SSD) is a good candidate to partially replace DRAM. A quick answer for this is using SSD as swap. But Linux swap is designed for slow hard disk storage, so there are a lot of challenges to efficiently use SSD for swap. SWAPOUT swap_map scan swap_map is the in-memory data structure to track swap disk usage, but it is a slow linear scan. It will become a bottleneck while finding many adjacent pages in the use of SSD. Shaohua Li have changed it to a cluster(128K) list, resulting in O(1) algorithm. However, this apporoach needs restrictive cluster alignment and only enabled for SSD. IO pattern In most cases, the swap io is in interleaved pattern because of mutiple reclaimers or a free cluster is shared by all reclaimers. Even though block layer can merge interleaved IO to some extent, but we cannot count on it completely. Hence the per-cpu cluster is added base on the previous change, it can help reclaimer do sequential IO and the block layer will be easier to merge IO. TLB flush: If we're reclaiming one active page, we should first move the page from active lru list to inactive lru list, and then reclaim the page from inactive lru to swap it out. During the process, we need to clear PTE twice: first is 'A'(ACCESS) bit, second is 'P'(PRESENT) bit. Processors need to send lots of ipi which make the TLB flush really expensive. Some works have been done to improve this, including rework smp_call_functiom_many() or remove the first TLB flush in x86, but there still have some arguments here and only parts of works have been pushed to mainline. SWAPIN: Page fault does iodepth=1 sync io, but it's a little waste if only issue a page size's IO. The obvious solution is doing swap readahead. But the current in-kernel swap readahead is arbitary(always 8 pages), and it always doesn't perform well for both random and sequential access workload. Shaohua introduced a new flag for madvise(MADV_WILLNEED) to do swap prefetch, so the changes happen in userspace API and leave the in-kernel readahead unchanged(but I think some improvement can also be done here). SWAP discard As we know, discard is important for SSD write throughout, but the current swap discard implementation is synchronous. He changed it to async discard which allow discard and write run in the same time. Meanwhile, the unit of discard is also optimized to cluster. Misc: lock contention For many concurrent swapout and swapin , the lock contention such as anon_vma or swap_lock is high, so he changed the swap_lock to a per-swap lock. But there still have some lock contention in very high speed SSD because of swapcache address_space lock. Zproject (led by Bob Liu) Bob gave us a very nice introduction about the current memory compression status. Now there are 3 projects(zswap/zram/zcache) which all aim at smooth swap IO storm and promote performance, but they all have their own pros and cons. ZSWAP It is implemented based on frontswap API and it uses a dynamic allocater named Zbud to allocate free pages. Zbud means pairs of zpages are "buddied" and it can only store at most two compressed pages in one page frame, so the max compress ratio is 50%. Each page frame is lru-linked and can do shink in memory pressure. If the compressed memory pool reach its limitation, shink or reclaim happens. It decompress the page frame into two new allocated pages and then write them to real swap device, but it can fail when allocating the two pages. ZRAM Acts as a compressed ramdisk and used as swap device, and it use zsmalloc as its allocator which has high density but may have fragmentation issues. Besides, page reclaim is hard since it will need more pages to uncompress and free just one page. ZRAM is preferred by embedded system which may not have any real swap device. Now both ZRAM and ZSWAP are in driver/staging tree, and in the mm community there are some disscussions of merging ZRAM into ZSWAP or viceversa, but no agreement yet. ZCACHE Handles file page compression but it is removed out of staging recently. From industry (led by Tang Jie, LSI) An LSI engineer introduced several new produces to us. The first is raid5/6 cards that it use full stripe writes to improve performance. The 2nd one he introduced is SandForce flash controller, who can understand data file types (data entropy) to reduce write amplification (WA) for nearly all writes. It's called DuraWrite and typical WA is 0.5. What's more, if enable its Dynamic Logical Capacity function module, the controller can do data compression which is transparent to upper layer. LSI testing shows that with this virtual capacity enables 1x TB drive can support up to 2x TB capacity, but the application must monitor free flash space to maintain optimal performance and to guard against free flash space exhaustion. He said the most useful application is for datebase. Another thing I think it's worth to mention is that a NV-DRAM memory in NMR/Raptor which is directly exposed to host system. Applications can directly access the NV-DRAM via a memory address - using standard system call mmap(). He said that it is very useful for database logging now. This kind of NVM produces are beginning to appear in recent years, and it is said that Samsung is building a research center in China for related produces. IMHO, NVM will bring an effect to current os layer especially on file system, e.g. its journaling may need to redesign to fully utilize these nonvolatile memory. OCFS2 (led by Canquan Shen) Without a doubt, HuaWei is the biggest contributor to OCFS2 in the past two years. They have posted 46 upstream patches and 39 patches have been merged. Their current project is based on 32/64 nodes cluster, but they also tried 128 nodes at the experimental stage. The major work they are working is to support ATS (atomic test and set), it can be works with DLM at the same time. Looks this idea is inspired by the vmware VMFS locking, i.e, http://blogs.vmware.com/vsphere/2012/05/vmfs-locking-uncovered.html CLK - 18th October 2013 Improving Linux Development with Better Tools (Andi Kleen) This talk focused on how to find/solve bugs along with the Linux complexity growing. Generally, we can do this with the following kind of tools: Static code checkers tools. e.g, sparse, smatch, coccinelle, clang checker, checkpatch, gcc -W/LTO, stanse. This can help check a lot of things, simple mistakes, complex problems, but the challenges are: some are very slow, false positives, may need a concentrated effort to get false positives down. Especially, no static checker I found can follow indirect calls (“OO in C”, common in kernel): struct foo_ops { int (*do_foo)(struct foo *obj); } foo->do_foo(foo); Dynamic runtime checkers, e.g, thread checkers, kmemcheck, lockdep. Ideally all kernel code would come with a test suite, then someone could run all the dynamic checkers. Fuzzers/test suites. e.g, Trinity is a great tool, it finds many bugs, but needs manual model for each syscall. Modern fuzzers around using automatic feedback, but notfor kernel yet: http://taviso.decsystem.org/making_software_dumber.pdf Debuggers/Tracers to understand code, e.g, ftrace, can dump on events/oops/custom triggers, but still too much overhead in many cases to run always during debug. Tools to read/understand source, e.g, grep/cscope work great for many cases, but do not understand indirect pointers (OO in C model used in kernel), give us all “do_foo” instances: struct foo_ops { int (*do_foo)(struct foo *obj); } = { .do_foo = my_foo }; foo>do_foo(foo); That would be great to have a cscope like tool that understands this based on types/initializers XFS: The High Performance Enterprise File System (Jeff Liu) [slides] I gave a talk for introducing the disk layout, unique features, as well as the recent changes.   The slides include some charts to reflect the performances between XFS/Btrfs/Ext4 for small files. About a dozen users raised their hands when I asking who has experienced with XFS. I remembered that when I asked the same question in LinuxCon/Japan, only 3 people raised their hands, but they are Chris Mason, Ric Wheeler, and another attendee. The attendee questions were mainly focused on stability, and comparison with other file systems. Linux Containers (Feng Gao) The speaker introduced us that the purpose for those kind of namespaces, include mount/UTS/IPC/Network/Pid/User, as well as the system API/ABI. For the userspace tools, He mainly focus on the Libvirt LXC rather than us(LXC). Libvirt LXC is another userspace container management tool, implemented as one type of libvirt driver, it can manage containers, create namespace, create private filesystem layout for container, Create devices for container and setup resources controller via cgroup. In this talk, Feng also mentioned another two possible new namespaces in the future, the 1st is the audit, but not sure if it should be assigned to user namespace or not. Another is about syslog, but the question is do we really need it? In-memory Compression (Bob Liu) Same as CLSF, a nice introduction that I have already mentioned above. Misc There were some other talks related to ACPI based memory hotplug, smart wake-affinity in scheduler etc., but my head is not big enough to record all those things. -- Jeff Liu

    Read the article

  • Testing Entity Framework applications, pt. 3: NDbUnit

    - by Thomas Weller
    This is the third of a three part series that deals with the issue of faking test data in the context of a legacy app that was built with Microsoft's Entity Framework (EF) on top of an MS SQL Server database – a scenario that can be found very often. Please read the first part for a description of the sample application, a discussion of some general aspects of unit testing in a database context, and of some more specific aspects of the here discussed EF/MSSQL combination. Lately, I wondered how you would ‘mock’ the data layer of a legacy application, when this data layer is made up of an MS Entity Framework (EF) model in combination with a MS SQL Server database. Originally, this question came up in the context of how you could enable higher-level integration tests (automated UI tests, to be exact) for a legacy application that uses this EF/MSSQL combo as its data store mechanism – a not so uncommon scenario. The question sparked my interest, and I decided to dive into it somewhat deeper. What I've found out is, in short, that it's not very easy and straightforward to do it – but it can be done. The two strategies that are best suited to fit the bill involve using either the (commercial) Typemock Isolator tool or the (free) NDbUnit framework. The use of Typemock was discussed in the previous post, this post now will present the NDbUnit approach... NDbUnit is an Apache 2.0-licensed open-source project, and like so many other Nxxx tools and frameworks, it is basically a C#/.NET port of the corresponding Java version (DbUnit namely). In short, it helps you in flexibly managing the state of a database in that it lets you easily perform basic operations (like e.g. Insert, Delete, Refresh, DeleteAll)  against your database and, most notably, lets you feed it with data from external xml files. Let's have a look at how things can be done with the help of this framework. Preparing the test data Compared to Typemock, using NDbUnit implies a totally different approach to meet our testing needs.  So the here described testing scenario requires an instance of an SQL Server database in operation, and it also means that the Entity Framework model that sits on top of this database is completely unaffected. First things first: For its interactions with the database, NDbUnit relies on a .NET Dataset xsd file. See Step 1 of their Quick Start Guide for a description of how to create one. With this prerequisite in place then, the test fixture's setup code could look something like this: [TestFixture, TestsOn(typeof(PersonRepository))] [Metadata("NDbUnit Quickstart URL",           "http://code.google.com/p/ndbunit/wiki/QuickStartGuide")] [Description("Uses the NDbUnit library to provide test data to a local database.")] public class PersonRepositoryFixture {     #region Constants     private const string XmlSchema = @"..\..\TestData\School.xsd";     #endregion // Constants     #region Fields     private SchoolEntities _schoolContext;     private PersonRepository _personRepository;     private INDbUnitTest _database;     #endregion // Fields     #region Setup/TearDown     [FixtureSetUp]     public void FixtureSetUp()     {         var connectionString = ConfigurationManager.ConnectionStrings["School_Test"].ConnectionString;         _database = new SqlDbUnitTest(connectionString);         _database.ReadXmlSchema(XmlSchema);         var entityConnectionStringBuilder = new EntityConnectionStringBuilder         {             Metadata = "res://*/School.csdl|res://*/School.ssdl|res://*/School.msl",             Provider = "System.Data.SqlClient",             ProviderConnectionString = connectionString         };         _schoolContext = new SchoolEntities(entityConnectionStringBuilder.ConnectionString);         _personRepository = new PersonRepository(this._schoolContext);     }     [FixtureTearDown]     public void FixtureTearDown()     {         _database.PerformDbOperation(DbOperationFlag.DeleteAll);         _schoolContext.Dispose();     }     ...  As you can see, there is slightly more fixture setup code involved if your tests are using NDbUnit to provide the test data: Because we're dealing with a physical database instance here, we first need to pick up the test-specific connection string from the test assemblies' App.config, then initialize an NDbUnit helper object with this connection along with the provided xsd file, and also set up the SchoolEntities and the PersonRepository instances accordingly. The _database field (an instance of the INdUnitTest interface) will be our single access point to the underlying database: We use it to perform all the required operations against the data store. To have a flexible mechanism to easily insert data into the database, we can write a helper method like this: private void InsertTestData(params string[] dataFileNames) {     _database.PerformDbOperation(DbOperationFlag.DeleteAll);     if (dataFileNames == null)     {         return;     }     try     {         foreach (string fileName in dataFileNames)         {             if (!File.Exists(fileName))             {                 throw new FileNotFoundException(Path.GetFullPath(fileName));             }             _database.ReadXml(fileName);             _database.PerformDbOperation(DbOperationFlag.InsertIdentity);         }     }     catch     {         _database.PerformDbOperation(DbOperationFlag.DeleteAll);         throw;     } } This lets us easily insert test data from xml files, in any number and in a  controlled order (which is important because we eventually must fulfill referential constraints, or we must account for some other stuff that imposes a specific ordering on data insertion). Again, as with Typemock, I won't go into API details here. - Unfortunately, there isn't too much documentation for NDbUnit anyway, other than the already mentioned Quick Start Guide (and the source code itself, of course) - a not so uncommon problem with smaller Open Source Projects. Last not least, we need to provide the required test data in xml form. A snippet for data from the People table might look like this, for example: <?xml version="1.0" encoding="utf-8" ?> <School xmlns="http://tempuri.org/School.xsd">   <Person>     <PersonID>1</PersonID>     <LastName>Abercrombie</LastName>     <FirstName>Kim</FirstName>     <HireDate>1995-03-11T00:00:00</HireDate>   </Person>   <Person>     <PersonID>2</PersonID>     <LastName>Barzdukas</LastName>     <FirstName>Gytis</FirstName>     <EnrollmentDate>2005-09-01T00:00:00</EnrollmentDate>   </Person>   <Person>     ... You can also have data from various tables in one single xml file, if that's appropriate for you (but beware of the already mentioned ordering issues). It's true that your test assembly may end up with dozens of such xml files, each containing quite a big amount of text data. But because the files are of very low complexity, and with the help of a little bit of Copy/Paste and Excel magic, this appears to be well manageable. Executing some basic tests Here are some of the possible tests that can be written with the above preparations in place: private const string People = @"..\..\TestData\School.People.xml"; ... [Test, MultipleAsserts, TestsOn("PersonRepository.GetNameList")] public void GetNameList_ListOrdering_ReturnsTheExpectedFullNames() {     InsertTestData(People);     List<string> names =         _personRepository.GetNameList(NameOrdering.List);     Assert.Count(34, names);     Assert.AreEqual("Abercrombie, Kim", names.First());     Assert.AreEqual("Zheng, Roger", names.Last()); } [Test, MultipleAsserts, TestsOn("PersonRepository.GetNameList")] [DependsOn("RemovePerson_CalledOnce_DecreasesCountByOne")] public void GetNameList_NormalOrdering_ReturnsTheExpectedFullNames() {     InsertTestData(People);     List<string> names =         _personRepository.GetNameList(NameOrdering.Normal);     Assert.Count(34, names);     Assert.AreEqual("Alexandra Walker", names.First());     Assert.AreEqual("Yan Li", names.Last()); } [Test, TestsOn("PersonRepository.AddPerson")] public void AddPerson_CalledOnce_IncreasesCountByOne() {     InsertTestData(People);     int count = _personRepository.Count;     _personRepository.AddPerson(new Person { FirstName = "Thomas", LastName = "Weller" });     Assert.AreEqual(count + 1, _personRepository.Count); } [Test, TestsOn("PersonRepository.RemovePerson")] public void RemovePerson_CalledOnce_DecreasesCountByOne() {     InsertTestData(People);     int count = _personRepository.Count;     _personRepository.RemovePerson(new Person { PersonID = 33 });     Assert.AreEqual(count - 1, _personRepository.Count); } Not much difference here compared to the corresponding Typemock versions, except that we had to do a bit more preparational work (and also it was harder to get the required knowledge). But this picture changes quite dramatically if we look at some more demanding test cases: Ok, and what if things are becoming somewhat more complex? Tests like the above ones represent the 'easy' scenarios. They may account for the biggest portion of real-world use cases of the application, and they are important to make sure that it is generally sound. But usually, all these nasty little bugs originate from the more complex parts of our code, or they occur when something goes wrong. So, for a testing strategy to be of real practical use, it is especially important to see how easy or difficult it is to mimick a scenario which represents a more complex or exceptional case. The following test, for example, deals with the case that there is some sort of invalid input from the caller: [Test, MultipleAsserts, TestsOn("PersonRepository.GetCourseMembers")] [Row(null, typeof(ArgumentNullException))] [Row("", typeof(ArgumentException))] [Row("NotExistingCourse", typeof(ArgumentException))] public void GetCourseMembers_WithGivenVariousInvalidValues_Throws(string courseTitle, Type expectedInnerExceptionType) {     var exception = Assert.Throws<RepositoryException>(() =>                                 _personRepository.GetCourseMembers(courseTitle));     Assert.IsInstanceOfType(expectedInnerExceptionType, exception.InnerException); } Apparently, this test doesn't need an 'Arrange' part at all (see here for the same test with the Typemock tool). It acts just like any other client code, and all the required business logic comes from the database itself. This doesn't always necessarily mean that there is less complexity, but only that the complexity happens in a different part of your test resources (in the xml files namely, where you sometimes have to spend a lot of effort for carefully preparing the required test data). Another example, which relies on an underlying 1-n relationship, might be this: [Test, MultipleAsserts, TestsOn("PersonRepository.GetCourseMembers")] public void GetCourseMembers_WhenGivenAnExistingCourse_ReturnsListOfStudents() {     InsertTestData(People, Course, Department, StudentGrade);     List<Person> persons = _personRepository.GetCourseMembers("Macroeconomics");     Assert.Count(4, persons);     Assert.ForAll(         persons,         @p => new[] { 10, 11, 12, 14 }.Contains(@p.PersonID),         "Person has none of the expected IDs."); } If you compare this test to its corresponding Typemock version, you immediately see that the test itself is much simpler, easier to read, and thus much more intention-revealing. The complexity here lies hidden behind the call to the InsertTestData() helper method and the content of the used xml files with the test data. And also note that you might have to provide additional data which are not even directly relevant to your test, but are required only to fulfill some integrity needs of the underlying database. Conclusion The first thing to notice when comparing the NDbUnit approach to its Typemock counterpart obviously deals with performance: Of course, NDbUnit is much slower than Typemock. Technically,  it doesn't even make sense to compare the two tools. But practically, it may well play a role and could or could not be an issue, depending on how much tests you have of this kind, how often you run them, and what role they play in your development cycle. Also, because the dataset from the required xsd file must fully match the database schema (even in parts that otherwise wouldn't be relevant to you), it can be quite cumbersome to be in a team where different people are working with the database in parallel. My personal experience is – as already said in the first part – that Typemock gives you a better development experience in a 'dynamic' scenario (when you're working in some kind of TDD-style, you're oftentimes executing the tests from your dev box, and your database schema changes frequently), whereas the NDbUnit approach is a good and solid solution in more 'static' development scenarios (when you need to execute the tests less frequently or only on a separate build server, and/or the underlying database schema can be kept relatively stable), for example some variations of higher-level integration or User-Acceptance tests. But in any case, opening Entity Framework based applications for testing requires a fair amount of resources, planning, and preparational work – it's definitely not the kind of stuff that you would call 'easy to test'. Hopefully, future versions of EF will take testing concerns into account. Otherwise, I don't see too much of a future for the framework in the long run, even though it's quite popular at the moment... The sample solution A sample solution (VS 2010) with the code from this article series is available via my Bitbucket account from here (Bitbucket is a hosting site for Mercurial repositories. The repositories may also be accessed with the Git and Subversion SCMs - consult the documentation for details. In addition, it is possible to download the solution simply as a zipped archive – via the 'get source' button on the very right.). The solution contains some more tests against the PersonRepository class, which are not shown here. Also, it contains database scripts to create and fill the School sample database. To compile and run, the solution expects the Gallio/MbUnit framework to be installed (which is free and can be downloaded from here), the NDbUnit framework (which is also free and can be downloaded from here), and the Typemock Isolator tool (a fully functional 30day-trial is available here). Moreover, you will need an instance of the Microsoft SQL Server DBMS, and you will have to adapt the connection strings in the test projects App.config files accordingly.

    Read the article

< Previous Page | 2 3 4 5 6